From 10b30be2e376ccf5840f693abcc3cec2b326b8f1 Mon Sep 17 00:00:00 2001 From: jhudsl-robot Date: Mon, 17 Jul 2023 20:37:13 +0000 Subject: [PATCH 1/2] Render bookdown --- docs/01-intro.md | 2 +- docs/03-writing_code.md | 2 +- docs/04-refactoring.md | 12 +++++++++--- docs/05-annotation.md | 8 +++++++- docs/06-understanding_other_ppl_code.md | 6 ++++++ docs/About.md | 2 +- docs/about-the-authors.html | 2 +- docs/annotating-your-code.html | 6 +++++- docs/introduction.html | 2 +- docs/refactoring-code.html | 10 +++++++--- docs/search_index.json | 2 +- docs/understanding-unfamiliar-code.html | 4 ++++ docs/writing-code.html | 2 +- 13 files changed, 45 insertions(+), 15 deletions(-) diff --git a/docs/01-intro.md b/docs/01-intro.md index 426a9331..3dfa00fb 100644 --- a/docs/01-intro.md +++ b/docs/01-intro.md @@ -71,7 +71,7 @@ The methods used for chatbots have evolved over time. Now chatbots often utilize Despite the fact that chatbots have been around awhile, the popularity of OpenAI's ChatGPT and DALL-E programs has sparked a recent surge of interest. These chatbots are in part particularly powerful due to the fact that large amounts of computing power were used to train their NLP models on very large datasets [@caldarini2022literature; @cahn2017chatbot]. Large language model AIs can be divided into two categories: those that can be reached using an internet browser, and those that can be reached using an integrated development environment (IDE). ::: warning -The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved and how sensitive or personally identifiable information are protected. +The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. Err on the side of caution when interacting with them. We do **not** recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. ::: diff --git a/docs/03-writing_code.md b/docs/03-writing_code.md index f64a7b9c..d171f1e4 100644 --- a/docs/03-writing_code.md +++ b/docs/03-writing_code.md @@ -538,7 +538,7 @@ devtools::session_info() ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source diff --git a/docs/04-refactoring.md b/docs/04-refactoring.md index 0bb0813a..44c32325 100644 --- a/docs/04-refactoring.md +++ b/docs/04-refactoring.md @@ -30,6 +30,12 @@ However, AI has significant potential to help with code refactoring. AI can use AI refactoring is also faster and more accurate than manual refactoring. This is particularly useful for large-scale software projects with massive codebases, where manual code review and refactoring can be an enormous task. In the next sections, we'll take a look at some examples of using AI to refactor code. +::: warning +The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. + +Err on the side of caution when interacting with them. We do **not** recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. +::: + ## Example One: Correcting Syntax Programming languages operate using their established set of rules. For example, [Python](https://www.python.org/) uses indentation to define code blocks. However, when you shift between development environments, indentation and line breaks can sometimes be affected, causing major headaches! @@ -547,7 +553,7 @@ proc.time() - start_time ``` ## user system elapsed -## 12.208 0.000 12.208 +## 12.039 0.001 12.041 ``` :::{.query} @@ -575,7 +581,7 @@ proc.time() - start_time ``` ## user system elapsed -## 0.575 0.327 0.598 +## 0.650 0.356 0.680 ``` The `outer()` function performs the same calculation as the nested loop in the original code, but more efficiently. It returns a matrix of all possible combinations of x and y values, with each element of the matrix being the product of the corresponding x and y values. The `rowSums()` function is then used to sum the elements of each row of the matrix, which is equivalent to summing the products of x and y for each index `i` in the original loop. This method avoids the need for the nested loop, resulting in a faster and more efficient computation. @@ -603,7 +609,7 @@ proc.time() - start_time ``` ## user system elapsed -## 0.265 0.288 0.350 +## 0.334 0.284 0.396 ``` One optimized way to perform the same calculation is by using the `%*%` operator to perform matrix multiplication. This can be done by converting x and y to matrices and transposing one of them so that their dimensions align for matrix multiplication. This code should be much faster than the original implementation because it takes advantage of highly optimized matrix multiplication algorithms in R. diff --git a/docs/05-annotation.md b/docs/05-annotation.md index f63cd8d7..337a5535 100644 --- a/docs/05-annotation.md +++ b/docs/05-annotation.md @@ -19,6 +19,12 @@ There are many benefits to annotating code: - Improves maintainability - Improves quality +::: warning +The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. + +Err on the side of caution when interacting with them. We do **not** recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. +::: + ## Improves readability Dino says It’s great this code is well annotated using AI. I totally forgot what I was doing with this code when I wrote it. @@ -368,7 +374,7 @@ devtools::session_info() ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source diff --git a/docs/06-understanding_other_ppl_code.md b/docs/06-understanding_other_ppl_code.md index f47dab84..50c2b820 100644 --- a/docs/06-understanding_other_ppl_code.md +++ b/docs/06-understanding_other_ppl_code.md @@ -41,6 +41,12 @@ Keep these ethical considerations in mind when using AI to summarize code. **Does the code contain any sensitive or personal information?** If the code contains any sensitive or personal information, you should take special care to protect that information. ::: +::: warning +The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. + +Err on the side of caution when interacting with them. We do **not** recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. +::: + ## Example One: Summarizing Code You Didn't Write Here's some simple code using the R programming language, written by an unnamed programmer. If you are familiar with R, you might be able to understand what this code does with little effort. However, if you don't know R, you're relatively new to R, or you're just having a bad day, understanding what this bit of code can take a great deal more energy and time. AI language bots can ease that burden. diff --git a/docs/About.md b/docs/About.md index 6a78aef5..3b72d86d 100644 --- a/docs/About.md +++ b/docs/About.md @@ -38,7 +38,7 @@ Tools Used: ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source diff --git a/docs/about-the-authors.html b/docs/about-the-authors.html index 7ec44d94..03accfec 100644 --- a/docs/about-the-authors.html +++ b/docs/about-the-authors.html @@ -471,7 +471,7 @@

About the Authors

## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source diff --git a/docs/annotating-your-code.html b/docs/annotating-your-code.html index 667908a8..72976d41 100644 --- a/docs/annotating-your-code.html +++ b/docs/annotating-your-code.html @@ -406,6 +406,10 @@

5.2 Annotating Code is Good Pract
  • Improves maintainability
  • Improves quality
  • +
    +

    The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown.

    +

    Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure.

    +

    5.3 Improves readability

    @@ -644,7 +648,7 @@

    5.10 Summary

    ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source diff --git a/docs/introduction.html b/docs/introduction.html index 236e8db2..6fc32579 100644 --- a/docs/introduction.html +++ b/docs/introduction.html @@ -433,7 +433,7 @@

    1.4 The AI Chatbots

    The methods used for chatbots have evolved over time. Now chatbots often utilize AI methods like deep learning (which involve multiple layers of abstractions of the input data (“Deep Learning” 2023)) to extract meaning from the language data (“Natural Language Processing” 2023). As these methods use large quantities of text, they are therefore often called large language models (“Large Language Model” 2023).

    Despite the fact that chatbots have been around awhile, the popularity of OpenAI’s ChatGPT and DALL-E programs has sparked a recent surge of interest. These chatbots are in part particularly powerful due to the fact that large amounts of computing power were used to train their NLP models on very large datasets (Caldarini, Jaf, and McGarry 2022; Cahn 2017). Large language model AIs can be divided into two categories: those that can be reached using an internet browser, and those that can be reached using an integrated development environment (IDE).

    -

    The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved and how sensitive or personally identifiable information are protected.

    +

    The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown.

    Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure.

    The dinosaur cartoon says in a speech bubble 'Wow, there are already so many options!' as names of AI tools swirl around the dinosaur's head.

    diff --git a/docs/refactoring-code.html b/docs/refactoring-code.html index 746986ce..bbea2a6c 100644 --- a/docs/refactoring-code.html +++ b/docs/refactoring-code.html @@ -407,6 +407,10 @@

    4.3 AI in Code Refactoring

    Code refactoring has historically been done manually by developers. This involves reviewing code and identifying areas that could be improved or optimized, and then making changes to the codebase accordingly. Though important, this is process is time-consuming and labor-intensive, as it requires developers to carefully review every line of code to identify potential issues or areas for improvement. Additionally, manual code refactoring is error-prone, as developers can accidentally introduce bugs or errors into the codebase while making changes.

    However, AI has significant potential to help with code refactoring. AI can use machine learning algorithms to analyze large amounts of code and identify patterns or areas that could be improved. For example, they can identify sections of code that are redundant, overly complex, or difficult to maintain, and suggest changes that could be made to improve the codebase. Machine learning algorithms can also help to identify potential bugs or security issues in the codebase, which can help to improve the overall quality and stability of the software.

    AI refactoring is also faster and more accurate than manual refactoring. This is particularly useful for large-scale software projects with massive codebases, where manual code review and refactoring can be an enormous task. In the next sections, we’ll take a look at some examples of using AI to refactor code.

    +
    +

    The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown.

    +

    Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure.

    +

    4.4 Example One: Correcting Syntax

    @@ -762,7 +766,7 @@

    4.10 Code optimization

    # End timer proc.time() - start_time
    ##    user  system elapsed 
    -##  12.208   0.000  12.208
    +## 12.039 0.001 12.041

    What is a faster, more optimized way of running the following R code?

    @@ -781,7 +785,7 @@

    4.10 Code optimization

    # End timer proc.time() - start_time
    ##    user  system elapsed 
    -##   0.575   0.327   0.598
    +## 0.650 0.356 0.680

    The outer() function performs the same calculation as the nested loop in the original code, but more efficiently. It returns a matrix of all possible combinations of x and y values, with each element of the matrix being the product of the corresponding x and y values. The rowSums() function is then used to sum the elements of each row of the matrix, which is equivalent to summing the products of x and y for each index i in the original loop. This method avoids the need for the nested loop, resulting in a faster and more efficient computation.

    So much faster! We can look at another option by regenerating the response:

    @@ -800,7 +804,7 @@

    4.10 Code optimization

    # End timer proc.time() - start_time
    ##    user  system elapsed 
    -##   0.265   0.288   0.350
    +## 0.334 0.284 0.396

    One optimized way to perform the same calculation is by using the %*% operator to perform matrix multiplication. This can be done by converting x and y to matrices and transposing one of them so that their dimensions align for matrix multiplication. This code should be much faster than the original implementation because it takes advantage of highly optimized matrix multiplication algorithms in R.

    While this second suggestion is faster, you will need to consider what aspects of the codebase are most important in each instance. For example, this code runs more quickly, but the %*% operator might be unfamiliar to some R programmers. In cases where efficiency is less important, or the data are not large, you might consider maximizing readability.

    diff --git a/docs/search_index.json b/docs/search_index.json index e55b3b8d..42c84898 100644 --- a/docs/search_index.json +++ b/docs/search_index.json @@ -1 +1 @@ -[["index.html", "AI for Software Development Harnessing the Power of Large Language Models About this Course Available course formats Suggested Activity Context Learning Objectives Lesson Breakdown", " AI for Software Development Harnessing the Power of Large Language Models July, 2023 About this Course This course on AI for software development explores the use of AI large language models such as ChatGPT, Bard, and others and their potential benefits and challenges. Through examples and hands-on activities, students will develop an understanding of the ways in which AI can speed up software development tasks and free up time for more creative and strategic work. By the end of the course, students will be equipped to navigate the rapidly changing landscape of software development and use AI chatbots in a way that maximizes benefits and efficiency while limiting harm as much as possible. Available course formats This course is available in multiple formats which allows you to take it in the way that best suits your needs. The material for this course can be viewed without login requirement on this Bookdown website. This format might be most appropriate for you if you rely on screen-reader technology. Our courses are open source, you can find the source material for this course on GitHub. The slides for the images in the course can be found here. Suggested Activity Context Course Audience Professionals looking to improve efficiency Students hoping to learn more about programming Anyone curious about how AI can be harnessed for technology Note: Those completely new to programming will find using AI tools to create software solutions challenging at this time. It is helpful to have some fundamental knowledge to write appropriate prompts and to discern when code is not working as expected. We advise novices to seek expert review. Course Prerequisites Some previous exposure to programming recommended. Assessment Type Hands-on activities following each lesson. Learning Objectives Learners will be able to: Explain the basics of AI and its potential for improving software development workflows Implement strategies to write and debug code using AI-based tools Describe the benefits of refactoring code using AI-powered techniques, such as making code readable, keeping it brief, and optimizing code Apply best practices for annotating code using AI Recognize strategies for using AI-based tools to understand and analyze code, such as code comprehension of unfamiliar languages or functions Discuss the challenges and ethical implications of using AI for different aspects of software development Lesson Breakdown Activity Duration Introduction 10 minutes Ethics for Using AI 20 minutes Writing Code 35 minutes Writing Exercise 15 minutes Refactoring Code 35 minutes Refactoring Exercise 15 minutes Annotating Code 25 minutes Understanding Code 25 minutes Understanding Exercise 10 minutes Total 3 hours & 10 minutes "],["introduction.html", "Chapter 1 Introduction 1.1 Learning Objectives 1.2 AI Provides New Opportunities 1.3 AI Code of Ethics 1.4 The AI Chatbots 1.5 Reproducibility With AI Chatbots 1.6 How to Use This Book 1.7 About This Book", " Chapter 1 Introduction The increasing availability of AI chatbots is creating a new landscape in software development and data science. It is important to be aware of the potential benefits and challenges of this technology, so that we can use it in a way that benefits everyone and limits harm as much as possible. On the one hand, AI chatbots have the potential to replace or significantly reduce the workload of programmers. This could lead to job losses in the data science industry. Additionally, there are significant ethical and privacy concerns regarding the use of training data sets to teach these AI systems. For example, if a chatbot is trained on a dataset of biased or discriminatory data, it could perpetuate those biases in its interactions with users. On the other hand, AI chatbots also have the potential to create new jobs and opportunities. For example, chatbot developers will be needed to create and maintain these AI systems. Additionally, AI chatbots could be used to automate tasks that are currently done by humans, freeing up programmers to focus on more creative and strategic work of interpreting what the data actually mean. Ultimately, while the future of programming in software development and data science is unclear, there is much to be optimistic about in this brave new world. The advancements in AI and chatbot technology present an opportunity for programmers to work alongside algorithms to create innovative solutions that were previously unimaginable. As with any technological revolution, there will be challenges, but it is up to us to work together to ensure that we use these tools ethically and responsibly, and to create a future that benefits everyone. 1.1 Learning Objectives In this chapter we will demonstrate how to: Consider the potential benefits and limitations of AI for software development Recognize how to use this course, how it was written, and what to expect Explain general concepts for ethical use of AI tools for software development Summarize what Chatbots are and how they work Distinguish between different AI options for software development 1.2 AI Provides New Opportunities We do not believe that AI chatbots will replace programmers, software developers, and data scientists. While AI can be powerful and useful, it is only as good as the training data behind it and the human using it. Therefore, for the time being and the foreseeable future, humans will remain at the forefront of programming innovation, with AI providing valuable support. As new code is often built on top of existing code written by humans, it is unlikely that AI will fully replace the need for human programmers. However, we do think that the roles of programmers in software development and data science are changing, and some of the skills required to work in this new environment are slightly different from those needed before. In addition to proficiency in programming syntax, programmers may need to develop a more refined ability to construct queries that enable AI bots to create the desired workflows. Other critical skills such as debugging, troubleshooting, and documentation will continue to be valuable, and can be enhanced through the use of AI. Furthermore, AI creates new opportunities and niches for tasks that were previously difficult, time-consuming, or impossible. This is an exciting time for those working in programming, with great potential to explore new and innovative solutions. As AI technology continues to advance, its role in programming will only become more significant, and it is up to developers to adapt and learn how to use these new tools effectively. We asked OpenAI’s ChatGPT whether AI chatbots would replace human software developers, data scientists, and programmers, and this was the response: “It is unlikely that AI chatbots will replace software developers, data scientists, and programmers entirely. While AI chatbots can perform certain tasks related to software development, data analysis, and programming, they are limited in their abilities and require human oversight and input to function properly. AI chatbots can be used to automate certain tasks such as generating simple code snippets, running tests, and performing simple data analysis. They can also assist in the development process by providing quick answers to commonly asked questions and suggesting possible solutions to programming problems. However, more complex software development, data analysis, and programming tasks require a level of creativity, critical thinking, and problem-solving that AI chatbots cannot currently replicate. Software developers, data scientists, and programmers have a deep understanding of the technologies and tools used to develop AI chatbots, as well as the ability to optimize and improve those tools. Furthermore, software development, data analysis, and programming require a high level of human communication and collaboration, which AI chatbots cannot currently replicate. Humans are also essential in making ethical decisions related to AI development, such as ensuring that AI systems are transparent, fair, and unbiased. In summary, while AI chatbots can provide valuable assistance to software developers, data scientists, and programmers, they are unlikely to replace them entirely. Human expertise, creativity, and critical thinking will continue to be essential in developing and optimizing AI systems for the foreseeable future.” 1.3 AI Code of Ethics Our first duty as programmers is to avoid causing harm to people and the world. This includes avoiding both intentional and inadvertent harm. Intentional harm is harm that is caused by a programmer’s deliberate actions. For example, a programmer might create a virus that is designed to harm computers. Inadvertent harm is harm that is caused by a programmer’s negligence or lack of foresight. For example, a programmer might create a program that is buggy and causes data loss, or create a program that is particularly vulnerable to being hacked. We must always be aware of the potential for harm and deliberately take steps to mitigate it. This includes things like: Be aware of the potential for bias. AI systems are trained on data, and that data can reflect the biases of the people who collected it. Programmers should be aware of the potential for bias in their data and take steps to mitigate it. Be transparent about your work. Share information about your code, data, and algorithms. This will help others to understand how your work works and to identify any potential problems. Be accountable for your work. Take responsibility for the consequences of your code. Be willing to make changes if necessary. Participate in discussions about AI ethics. There are many online and in-person communities where you can discuss the ethical implications of AI with other people. Advocate for ethical AI. Speak up when you see AI being used in a way that you believe is unethical. Support organizations that are working to promote ethical AI. 1.4 The AI Chatbots Humans have been interacting with AI chatbots for years. In fact, Alan Turing is credited with coming up with the concept for chatbots as early as 1950. Chatbots are software-based systems that interact with humans typically by text or speech inputs, rather than code. They mimic some human activity (“Chatbot” 2023; Abdulla et al. 2022) based on these language inputs. They process the inputs using natural language processing commonly abbreviated as NLP. NLP is a kind of AI that uses human text or speech and parses the language to determine structures and patterns to extract meaning. NLP uses large amounts of language data (such as books, websites etc.) to train AI systems to identify these structures and patterns. For example, the AI model might identify when a sentence is a question or a statement by examining various features in a prompt such as the inclusion of a question mark of the use of words often used in questions (“Natural Language Processing” 2023; Cahn 2017). The methods used for chatbots have evolved over time. Now chatbots often utilize AI methods like deep learning (which involve multiple layers of abstractions of the input data (“Deep Learning” 2023)) to extract meaning from the language data (“Natural Language Processing” 2023). As these methods use large quantities of text, they are therefore often called large language models (“Large Language Model” 2023). Despite the fact that chatbots have been around awhile, the popularity of OpenAI’s ChatGPT and DALL-E programs has sparked a recent surge of interest. These chatbots are in part particularly powerful due to the fact that large amounts of computing power were used to train their NLP models on very large datasets (Caldarini, Jaf, and McGarry 2022; Cahn 2017). Large language model AIs can be divided into two categories: those that can be reached using an internet browser, and those that can be reached using an integrated development environment (IDE). The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved and how sensitive or personally identifiable information are protected. Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. This is not a comprehensive list! Browser-based chatbots Bard is Google’s large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive. It is trained on a massive amount of text data and is able to communicate and generate human-like text in response to a wide range of prompts and questions. You will need to join the waiting list before getting access, although in our experience the wait time is short. ChatGPT is OpenAI’s large language model based on the GPT (Generative Pre-trained Transformer) architecture. It is designed to understand and generate human-like responses to text-based conversations. You will need to create an account before using it. Phind is a AI-powered search engine designed specifically for developers asking technical questions. In addition to answering your coding question, it will provide links to the internet sources used to find the answer. Poe is Quora’s platform that gives you access to multiple AI chatbots, including Sage, based on OpenAI’s GPT-3.5 architecture; Claude, an AI natural language bot created by Anthropic; and ChatGPT. IDE-based chatbots CoPilot by GitHub is a language bot that interfaces with your editor. This bot provides coding suggestions and ways to speed up testing. GitHub describes it as an “AI pair programmer”. It uses a subscription model with both individual and business tiers. CodeWhisperer is the Amazon Web Services chatbot for the AWS cloud platform. It can be installed in your integrated development environment (IDE) using the latest AWS Toolkit plugin. The individual tier for CodeWhisperer is free, although you will need to create an AWS account and add a payment method. What are the data sources for these chatbots? Some high-profile English-language AIs, including Google’s T5 and Facebook’s LLaMA, were trained on more than 15 million websites (Schaul, Chen, and Tiku 2023). The top sources of training data for these AI models were https://patents.google.com/ and https://www.wikipedia.org/ ! 1.5 Reproducibility With AI Chatbots Reproducibility in programming is the ability to reproduce the results of an analysis using the same data and methods. It’s an important aspect of working with AI tools as well, particularly when using AI to debug or fix code. In order to maximize the reproducibility of your work, you should create good habits when working with any AI tool. This includes documenting your prompts and the intentions behind them, as well as saving the response threads generated by an AI chatbot. Some of the current chatbots already save your threads (particularly ChatGPT and Phind). At this time, Bard and the chatbots accessible via Poe do not save queries as individual threads but do keep a record of the questions and responses for some time. However, it is best practice to save them yourself as the saved chat threads can be reset. Saving them yourself also allows you to write notes about your intentions. Keep in mind that the answers from the chatbots can change over time - don’t expect them to always be the same! Phrasing a query slightly differently can result in a different answer, as can asking in a new session. You should expect to ask the AI tool the same question in multiple ways in order to be confident in the answer. 1.6 How to Use This Book This book is written with the expectation that users will follow along and paste the queries from each example into their own AI chatbot session. After you have finished this book, you will be able to use whichever combination of chatbot and programming environment that you’re comfortable using. 1.6.1 Choice of AI Model You can use whichever AI you like for this course, as long as it is a large language model. (Other types of AIs are not explored in this course.) We will use a variety of browser-based chatbots as examples throughout this course. Some AIs will be better suited for your particular tasks, and it is too early to judge which of the technologies will end up dominating the industry. We encourage you to test multiple AI platforms to find the ones that best fit your needs, and the use of a particular AI in each chapter should not be taken as an endorsement by us. We have focused on the browser-based chatbots for our examples in this course for ease of demonstration. 1.6.2 Computing Environment Needs All the examples and activities in this course can be done in the computer environment you prefer. None of the tasks require a huge amount of computing power, so local installations on your personal computer are fine, though everything will also translate to a high performance computing cluster or a cloud computing platform. In addition to the CodeWhisperer chatbot available on the AWS cloud platform, a variety of cloud-based AI tools and platforms exist that can be leveraged for practical projects after this course. For instance, Amazon SageMaker or Google Cloud AI Platform can be used for constructing and training machine learning models, while Microsoft Azure Cognitive Services can be applied to incorporate natural language processing features into your applications. If you’d like to learn more about how AI can be integrated with cloud computing, you can read about it in the Additional Learning section. 1.6.3 Boxes Throughout this course, you will see several types of specially-colored boxes. In particular, there are two to which you should pay special attention. One box highlights potential ethical issues with an AI usage. In this box, you’ll find reminders of the ethical considerations you should be particularly aware of while using AI to do certain tasks. The other box highlights our queries to AI chatbots. The question we posed to the AI chatbot is written in these boxes to make it easier for you to find. Wherever possible, we use screenshots of the actual answers we got from the AI chatbots in response to our questions. Occasionally when we have asked for the AI to write or fix code, we have formatted the answer in such a way that makes it easier to use. When you see this format, you will know it indicates an AI response Code in these boxes was written by AI. 1.7 About This Book We used AI bots (including ChatGPT, Bard, Sage, and Claude) to develop and write much of this course. We firmly believe this new technology can be used in ethical ways and we put into practice all the rules and suggestions offered over the following modules while we created this book. "],["ethics-of-using-ai.html", "Chapter 2 Ethics of Using AI 2.1 Learning Objectives 2.2 Major concerns 2.3 Replacing Humans 2.4 Bias 2.5 Misinformation 2.6 Unintelligible Code 2.7 Faulty or Buggy Code 2.8 Security and Privacy issues 2.9 Violating Copyright 2.10 Harmful code 2.11 Summary", " Chapter 2 Ethics of Using AI The use of artificial intelligence (AI) and in particular, generative AI, in coding has raised a number of ethical concerns. We will highlight several current concerns, however please be aware that this is a dynamic field and the possible implications of this technology is continuing to develop. It is critical that we as a society continue to evaluate and predict what the consequences of the use of AI will be, so that we can mitigate harmful effects. 2.1 Learning Objectives In this chapter we will demonstrate how to: Describe key ethical concerns for using AI tools in software development Identify possible mitigation strategies for these major concerns Explain the potential benefits of being transparent about the use of AI tools to write code Discuss why human contributions are still important and necessary Recognize strategies for using AI tools to write code more responsibly Implement prompts to ask AI tools about responsible use throughout various coding practices 2.2 Major concerns In this chapter we will discuss the following issues that writing code with AI tools may contribute to: Replacing Humans - AI tools can help humans write code, but they are not a replacement. Bias - AI models are built on data and code that were created by biased humans, this bias can be further perpetuated. Misinformation - fake or manipulated data used to help design algorithms could be believed to be correct and this could be further propagated. Unintelligible Code - Developers need to follow best practices with code generated by AI. Just like code generated by human developers, this includes figuring out how the code works, how it integrates with other code, as well as annotating and documenting how to use the code . Faulty or Buggy Code - Code that may not be optimal for a given situation may be inadvertently used by those with less familiarity, which may result in faulty software. Security or Privacy Issues - Code may not be optimized for security if not adequately evaluated. Generative AI models have access to large amounts of data that is currently unregulated and may include data that should be protected for privacy reasons. Copyright Violations - The code used for the generative AI model may used code that has copyright laws that require attribution or do not allow reuse and it may not be clear where the code came from. Harmful Code - Currently it is not clear how well generative AI models restrict the creation of code that will be used for goals that harm others. Note that this is an incomplete list; additional ethical concerns will become apparent as we continue to use these new technologies. We highly suggest that users of these tools be transparent about the use of these tools, so that as new ethical issues emerge, we will be better prepared to understand the implications for specific coding projects. Be transparent about what AI tools you use to write code. This help others to better understand how you created your code, as well as the possible sources that the AI tools might have used when helping you write code. It may also help with future unknown issues related to the use of these tools. Some organizations and scientific societies have created guidelines or requirements for using AI in journal articles and conference submissions, like the International Society for Computational Biology. Be aware of the requirements/guidelines for your field. It is essential to address these ethical concerns and ensure that the use of AI in coding is done in a responsible and transparent manner. This could be done through ensuring the quality of the data used to train AI systems, promoting transparency in AI-generated code, and implementing safeguards against the creation of harmful or biased code. By doing so, we can harness the potential of AI to improve and transform the way we write and optimize code while maintaining ethical standards. 2.3 Replacing Humans Those who use AI tools to write code or to understand code need to recognize their own value in the process. While AI systems are useful, they do not replace the strengths that humans have for innovating new ways to write code, evaluating how the code integrates into the larger picture of a project, or in evaluating the downstream consequences of the code. Computer science is a field that has historically lacked diversity and new learners can often feel intimidated. There is the potential that new learners may feel even more discouraged as they learn to write code when witnessing AI tools write code. It is critical that we support diverse new learners of computer science, as we will continue to need human involvement in the development and use of AI tools. 2.3.1 Tips for supporting human code Follow best teaching practices for inclusion of diverse learners. Encourage current and future programmers to continue to innovate and write code. Avoid thinking that code written by computers must be better than code written by humans, as this simply is not true. Recall that humans first wrote they code that the AI tools use to write code. Be transparent about when you do or do not use AI tools, give credit to the humans who write code where possible. A new term in the medical field called AI paternalism describes the concept that doctors (and others) may trust AI over their own judgment or the experiences of the patients they treat. This has already been shown to be a problem with earlier AI systems intended to help distinguish patient groups. Not all humans will necessarily fit the expectations of the AI model if it is not very good at predicting edge cases (Hamzelou n.d.). Therefore, in all fields it is important for us to not forget our value as humans in our understanding of the world. 2.4 Bias One of the biggest concerns is the potential for AI to create biased code. AI systems are trained on data created by humans. If this data used to train the system is biased (and this includes existing code that may be written in a biased manner), the resulting code could also be biased. This could lead to discrimination, abuse, or neglect for certain groups of people, such as those with certain ethnic or cultural backgrounds, genders, ages, sexuality, capabilities, religions or other group affiliations. It is well known that data and code are often biased (Belenguer 2022). It is important that evaluations be made throughout the software development process to check for and consider potential perpetuation of bias. 2.4.1 Tips for avoiding bias Be aware of the potential biases in the data that is used to train AI systems. Check for possible biases within the data. Are there harmful data values? Examples could include discriminatory and false associations. Are the data adequately inclusive? Examples could include a lack of data about certain ethnic or gender groups or disabled individuals, which could result in code that does not adequately consider these groups, ignores them all together, or makes false associations. Are the data of high enough quality? Examples could include data that is false about certain individuals. Evaluate the code for biases as it is developed. Check if any of the criteria for weighting certain data values over others are rooted in bias. Consider the possible outcomes of the use of the code. Consider if the code could possibly be used in a manner that will result in discrimination. See Belenguer (2022) for more guidance. We also encourage you to check out the following video for a classic example of bias in AI: 2.5 Misinformation AI tools use data that may contain false information. AI tools may also report data as if it is real, when it is in fact not real. For example, currently at the time of the writing of this course, ChatGPT will report citations with links that are not correct. 2.5.1 Tips for reducing misinformation Be aware that some AI tools currently make up false information based on artifacts of the algorithm. Do not assume that everything that the algorithm produces is real or correct. 2.6 Unintelligible Code There is risk that those less aware of best coding practices use AI-generated code and do not follow these practices. This could make it difficult for others to understand how the code works and could make it hard to identify and fix any issues that may arise. This could result in negative consequences, such as system crashes or security breaches, that could have been avoided if the code had been written by an experienced and savvy human programmer. 2.6.1 Tips for avoiding unintelligible code Code should be reviewed by experienced programmers. Code should be annotated throughout to explain what the code is doing. Documentation should be created that describes how to use the code properly. Ask AI tools to help you understand the code, but get expert assistance where needed. Can you explain what the code in line 4 is doing? 2.7 Faulty or Buggy Code There is a risk that code written with AI tools, if not reviewed carefully enough, may be faulty. AI systems may not be trained on data that relates to current software dependencies or best practices. It is also important to remember that code generated by AI tools is not necessarily better than code written by humans. It requires just as much code review. 2.7.1 Tips for reducing faulty or buggy code Do not assume that the code generate by AI is correct. Realize that AI is only as good or up-to-date as what it was trained on, the code may be generated using out-of-date code. Look up packages and functions used to ensure if the code is up-to-date. Make sure that you understand the code that you are using. AI can be used to help you understand what the code is doing, but consult with experts when needed. Ask the AI tools for extra information about if there are any potential limitations or weaknesses in the code, but keep in mind that the tool may not be aware of issues and therefore human review is required. The information provided by the tool can however be a helpful starting point. Are there any limitations associated with this code? What assumptions were made in writing this code? Stack Overflow, a popular community-based website where programmers help one another, has (at the time of writing this) temporarily banned users from answering questions with AI-generated code. This is because users were posting incorrect answers to questions. It is important to follow this policy (as you may face removal from the community). This policy goes to show that you really need to check the code that you get from AI models. While they are currently helpful tools, they do not know everything. 2.8 Security and Privacy issues AI tools are not always trained in a way that is particularly conscious of data security. Furthermore, if the code is written using these tools by users who are less familiar with coding security concerns, protected data or important passwords may be leaked within the code itself. AI systems may also utilize data that was actually intended to be private. It is also important to consider what data your code might actually be using. 2.8.1 Tips for reducing security and privacy issues Check that all passwords, access tokens (like API keys), security certificates are not saved in a public place where anyone can access or tamper with them. Check that no sensitive data, such as Personal Identifiable Information (PII) becomes public through the code. Utilize encryption and other security practices where needed. Consult with an expert about data security if you think your code could possibly cause someone to access protected data who is not authorized to access it. Promote for regulation of AI tools by voting for such standards where possible. Ask AI tools for help, but to not rely on them alone. Are there any security or privacy concerns associated with this code? 2.9 Violating Copyright When AI systems are trained on data, they may also learn and incorporate code from that data. This means that AI-generated code could potentially infringe on the copyright of the original author of the code. For example, if an AI system is trained on a GitHub repository that contains code written by a human programmer, the AI system could generate code that is identical to or similar to the code in the GitHub repository. If the AI system then uses this code without permission from the original author, this could constitute copyright infringement. In general, we want programmers to feel comfortable sharing their code openly without fear they won’t be credited. Similarly, AI systems could potentially infringe on intellectual property rights by using code that is protected by trademarks or patents. For example, if an AI system is trained on a training manual that contains code that is protected by a trademark, the AI system could generate code that is identical to or similar to the code in the training manual. If the AI system then uses this code without permission from the trademark owner, this could constitute trademark infringement. 2.9.1 Tips for avoiding copyright violations Be transparent about what AI tools you use to write your code. Obtain permission from the copyright holders of any code that you use to train the AI system. Only use code that is in the public domain or that has been licensed for use by the AI system’s owner. Cite any GitHub repositories or training manuals you might use in your code. Ask the AI tools if the code it helped generate used any code that you can cite. Did this code use any code from others that I can cite? 2.10 Harmful code Another major concern is the use of AI to generate malicious code. For instance, AI could be used to create code that spreads malware or hacks into computer systems. This could cause severe damage to individuals and organizations, including data breaches and financial losses. 2.10.1 Tips for avoiding the creation of harmful code Be careful about what code you share publicly, as it could be used for malicious purposes. Consider how the code might be used by others. Ask the AI tools to help you, but do not rely on them alone. What are the possible downstream uses of this code? What are some possible negative consequences of using this code? 2.11 Summary Here is a summary of all the tips we suggested: Disclose when you use AI tools to write code. Credit human authors by citing their code and adhering to copyright restrictions. Ensure the code doesn’t expose sensitive data, such as passwords or PII. Don’t assume AI-generated content is real, accurate, or current. Ask the AI tools to help you understand: Sources for the code that you can cite What the code is doing Code limitations Potential security or privacy issues Potential downstream consequences of the code Always have expert humans review the code and value your own contributions and thoughts. Overall, we hope that these guidelines and tips will help us all to use AI tools more responsibly for writing code. We recognize however, that as this is emerging technology and more ethical issues will emerge as we continue to use these tools in new ways. AI tools can even help us to use them more responsibly when we ask the right additional questions when writing code, but remember that human review is always necessary. Staying up-to-date on the current ethical considerations will also help us all continue to use AI responsibly. "],["writing-code.html", "Chapter 3 Writing Code 3.1 Learning Objectives 3.2 Coding Basics 3.3 Writing Code with AI 3.4 Tips for Coding with AI 3.5 Example One: Creating a Function 3.6 Example Two: Debugging Code 3.7 Example Three: Comparing Syntax 3.8 Example Four: Figuring Out a Common Task 3.9 Example Five: Planning Your Program 3.10 Example Six: Selecting a Coding Language 3.11 Hands On Exercise 3.12 Summary", " Chapter 3 Writing Code 3.1 Learning Objectives Recognize the benefits and capabilities of AI-powered programming tools, particularly for certain kinds of tasks that are difficult for humans Explain how AI-generated code and automated programming tools contribute to making programming more accessible to individuals with limited coding experience or to those working in a new language Explore the ethical considerations associated with using AI in programming, including the importance of rigorous testing before publishing code Apply strategies for using AI to build functions, debug code, understand error messages, check syntax, choose the proper programming language, and plan out a multi-step programming project 3.2 Coding Basics In today’s digitally-driven world, coding has become an essential skill. It is used to create websites, mobile applications, and sophisticated software systems. Contrary to popular belief, coding is not only for those who possess an innate talent for programming. With the right resources and a willingness to learn, anyone can become proficient in coding. Practice is crucial to mastering the art of coding, and those who are willing to commit themselves to the learning process can achieve great results. Writing code can also be a rewarding and fulfilling hobby, too. Creating your own software, building a website, or automating a task can be immensely satisfying, and can provide a sense of accomplishment and pride. Moreover, writing code can be a form of creative expression. Just as musicians use notes to compose music and artists use paint to create visual art, programmers use code to create digital experiences. From building websites to creating video games, code can be used to bring ideas to life in exciting and innovative ways. Ultimately, the ability to write code is a valuable asset in today’s world, with increasing demand for skilled programmers in a wide range of industries. Whether as a career or hobby, coding can be both personally and professionally rewarding. The accessibility and versatility of coding make it an excellent skill for anyone to learn, regardless of their background or previous experience. 3.3 Writing Code with AI AI-powered programming tools and technologies are revolutionizing the way we approach software development. AI can generate code snippets. An AI-powered code editor can suggest more efficient algorithms or highlight potential errors in real-time. It can also help developers to reproduce bugs more easily and to test potential fixes more quickly. This can save developers a significant amount of time and effort, and it can help them to debug their code more effectively. Be aware that AI still makes a lot of mistakes when writing code, so you will need to be particularly careful in testing code it writes. AI can make programming more accessible. With AI-generated code and automated programming tools, individuals who are not experienced developers can still create software solutions. Experienced developers can also more easily write code in languages with which they aren’t familiar. This has the potential to democratize programming and lead to new innovations from a wider range of people. Can complete novices learn how to program using AI tools? At present, it is very challenging for complete novices to rely on AI chatbots to learn how to write code. These tools will sometimes write buggy code or code that doesn’t not work as intended, and a person with no programming experience will have difficulty learning how to spot these sorts of mistakes. AI chatbots are best thought of as a supplement to your software development education, not as a replacement, and they should be used alongside other resources. Therefore, we advise that you seek expert review and assistance. As AI tools are refined, complete novices may be able to create software products more readily on their own. AI is changing the skills required for writing code. Rather than simply memorizing syntax, developers need to focus on developing their creativity, intuition, and problem-solving skills. Additionally, programmers must also hone their abilities in architecture design and project management, as these skills are becoming increasingly important in the modern software development landscape. As AI continues to evolve, it is likely to have an even greater impact on the way we develop software in the future. As a programmer using AI, it is up to you to be responsible for what you create. You must rigorously test any code you write. It is your job to make sure any code you create with AI is not malicious and works as expected. 3.4 Tips for Coding with AI Using AI to write code is as much of an art as writing your own code. Following a few rules and suggestions will make your experience much more enjoyable and productive. Remember that AI cannot read your mind. You need to tell it what you’re thinking, and it may take several rounds of question-and-answer before you get something you can work with. Be clear about your goals. What do you want the AI to do? Be as specific as possible. Start with a simple task. Don’t try to do too much too soon. Start with a simple task and then gradually work your way up to more complex tasks. You wouldn’t write a complete program all at once, and you shouldn’t have AI do it, either. Provide enough data. The more data (and detail) you can give the AI, the better it will be able to learn and perform the task you want it to do. Be patient. Querying an AI can take time. Don’t expect it to be perfect right away. Test everything AI writes. You are responsible for the code that AI generates, even if you didn’t write it yourself. AI can be biased, just like humans, and this could lead to sub optimal development. There is a concern that AI may focus development on areas that optimize certain metrics but not overall quality, effectiveness, or user needs. Human judgment is still needed to ensure that AI is used in a responsible and ethical way. Test your results. Make sure the AI is actually doing what you want it to do. AI systems are trained on data, and if the data is not representative of the real world, the AI system may not be able to generate code that works correctly. It is important to test the code to make sure that it is working as intended and that it is producing the desired results. Inadequate testing could lead to low quality, buggy code. Since AI generated code would be highly automated, it requires strong testing procedures and safety oversight to ensure it meets standards before deployment. Make sure your code is secure. AI systems can be used to generate code that is vulnerable to attack, and it is important to test the code to make sure that it is secure. This can help to prevent security breaches from occurring. Use a variety of AI models. There are many different AI models available. Experiment with different models to see which one works best for your task. Get feedback from others. Ask other people to review your code and give you feedback. This can help you identify and fix any problems with your code. 3.5 Example One: Creating a Function One of the most common things a programmer does is write functions to automate tedious tasks. With proper guidance, AI can speed this process up. Perhaps you need to write a quick function that reverses all the letters in a string. We can query Phind for help. How do I write a function that reverses the letters in a string? To reverse a string in Bash, there are multiple approaches. One way to reverse a string is by using the rev command. According to StackOverflow, the rev command is installed on most Linux systems by default. To use the rev command, simply enter rev followed by the string you want to reverse. For example: $ echo "12345" | rev However, it is important to note that the rev command is not present in all flavors of Unix. Another way to reverse a string is by using a simple Bash script. According to StackOverflow, we can write a Bash script that reverses a string using a for loop. Here is an example script: var="12345" copy=\\${var} len=${#copy} for((i=$len-1;i>=0;i--)); do rev="$rev\\${copy:$i:1}"; done echo "var: $var, rev: $rev" This script takes the string “12345” and stores it in the variable var. It then creates a copy of the string and stores it in the variable copy. The length of the string is stored in the variable len. The for loop starts at the end of the string and iterates through each character, adding it to the variable rev. Finally, the script prints the original string and the reversed string. Another approach to reversing a string is by using the tac command. According to StackOverflow, we can use the tac command to reverse the contents of a file or the standard input. Here is an example command: $ echo "12345" | sed 's/./&\\n/g' | tac | sed -e :a -e 'N;s/\\n//g;ta' This command uses the sed command to insert a newline character after each character in the string. It then passes the output to the tac command, which reverses the order of the lines. Finally, another sed command is used to remove the newline characters and print the reversed string. Overall, there are multiple approaches to reversing a string in Bash. The rev command, a simple Bash script with a for loop, and the tac command are all viable options. Phind offers you a variety of ways to do this, but all these options are in the language Bash. Because we didn’t include a programming language in our query, Phind picked one for us. Let’s refine our query a little. How do I write a function using Python that reverses the letters in a string? To write a function using Python that reverses the letters in a string, there are several approaches that can be taken. One approach involves using a for loop and the concatenation operator to join the original string in reverse order. This technique is demonstrated in realpython.com as follows: def reverse_string(text): result = "" for char in text: result = char + result return result print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and initializes an empty string result. The for loop iterates over each character in the input string, concatenating it with the current content of result and reassigning the new intermediate string to result at each iteration. At the end of the loop, result holds a new string as a reversed copy of the original one. Another approach involves using slicing to reverse the string. This technique is demonstrated in stackoverflow.com as follows: def reverse_string(text): return text[::-1] print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and returns the string in reverse order using slicing. The syntax text [::-1] means to slice the entire string with a step of -1, which reverses the order of the string. A third approach involves using Python’s built-in reversed() function to reverse the string. This technique is demonstrated in datagy.io as follows: def reverse_string(text): return ''.join(reversed(text)) print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and uses the reversed() function to return an iterator that produces the characters of the string in reverse order. The join() method is then used to convert the iterator back into a string. All three approaches produce the same output, but they differ in terms of performance and readability. The first approach using a for loop and concatenation is the most straightforward, but it can be slower for large strings because it creates successive intermediate strings only to throw them away in the next iteration. The second approach using slicing is more efficient because it creates a new string object only once, but it may be less readable for some programmers. The third approach using reversed() is concise and readable, but it may be slower for large strings because it creates a reversed iterator that must be converted back into a string using join(). Once again, Phind has come through with multiple options. Now let’s say you knew you wanted to write a function like this, but you realize it’s only the strings in the third column of your file. Let’s add that requirement to our query. “How do I write a function using python that reverses the letters in a string for the third column of my file?” To write a function using Python that reverses the letters in a string for the third column of a file, you can use the extended slice syntax in Python. The extended slice syntax is a more concise and faster way of slicing strings in Python. To reverse a string using extended slices, you can use the following syntax: string[::-1] In this syntax, the first colon indicates the start of the slice, the second colon indicates the end of the slice, and the third colon indicates the step size of the slice. By setting the step size to -1, you can reverse the string. To create a function that reverses the letters in a string for the third column of a file, you can use the following code: def reverse_string(string): return string[:2] + string[2:5][::-1] + string[5:] In this code, the reverse_string() function takes a string as input and returns the string with its third column reversed. The first two characters of the string are returned as-is, while the third column (characters 3 to 5) are reversed using extended slices. Finally, the remaining characters of the string are returned as-is. There are other ways to reverse a string in Python, including using loops, recursion, stacks, and function calls. However, using extended slices is the most concise and efficient way to reverse a string in Python. Here are some pros and cons of using extended slices to reverse a string: Pros: Concise syntax: using the extended slice syntax is a one-liner that is easy to read and write. Fast execution: using the extended slice syntax is faster than using loops or other methods to reverse a string. Cons: Limited functionality: the extended slice syntax can only be used to reverse a string, and cannot be used for more complex string operations. Not beginner-friendly: the extended slice syntax is not intuitive for beginners who are not familiar with Python’s slicing syntax. In conclusion, to write a function using Python that reverses the letters in a string for the third column of a file, you can use the extended slice syntax in Python. This syntax is concise and efficient, but may not be beginner-friendly and may not be suitable for more complex string operations. 3.6 Example Two: Debugging Code A vast amount of time in programming is spent on debugging code. Often the compiler error messages are very precise about what has gone wrong while simultaneously lacking enough detail for the programmer to immediately understand how to fix the error. It is good practice to debug code as you go, instead of writing the entire program and then testing to identify the errors. Using an AI chatbot makes this task much, much easier. There are two options for using ChatGPT for debugging. First, we could query what the error code means. Here’s a Python code snippet that is throwing a TypeError: unsupported operand type(s) for +: 'int' and 'str' error when we try running it. def add_numbers(a, b): result = a + b return result number1 = 10 number2 = "20" sum = add_numbers(number1, number2) print("The sum of", number1, "and", number2, "is", sum) What does this error mean? TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’ This error typically occurs when you try to concatenate (i.e., add together) a string and an integer using the “+” operator in Python. In Python, the “+” operator is used for both addition and concatenation depending on the data types of the operands. If you try to add a string and an integer using the “+” operator, Python will raise a TypeError because these two data types are not compatible for this operation. For example, if you have the following code: age = 25 message = "I am " + age + " years old." Python will raise the TypeError because you are trying to concatenate a string (“I am”) with an integer (25) using the “+” operator. To fix this error, you can convert the integer to a string using the str() function: age = 25 message = "I am " + str(age) + " years old." In this case, the str() function converts the integer age to a string before concatenating it with the other string, so the operation is valid and no error will be raised. ChatGPT helpfully explains what TypeError means, what causes it, and how to fix it. This is quite useful when you don’t understand the error message. Another option is to submit the code snippet that is causing an error to ChatGPT directly and ask for it to fix the mistake. Here’s another buggy code snippet, this time in C#. Can you debug this code? using System; class Program { static void Main() { int[] numbers = { 1, 2, 3, 4, 5 }; int sum = 0; for (int i = 1; i <= numbers.Length; i++) { sum += numbers[i]; } Console.WriteLine("The sum of the numbers is: " + sum); } } Sure, I can help you debug this code. Here are the issues I see: The loop is going out of bounds: The loop is iterating from 1 to numbers.Length, but array indexes start from 0. Therefore, when i is equal to numbers.Length, numbers[i] is trying to access an element outside the bounds of the array, which will result in an IndexOutOfRangeException. To fix this, you should change the loop condition to i < numbers.Length. The loop is not summing the array elements correctly: The loop is adding the elements starting from the second element instead of the first. To fix this, you should change the initial value of i to 0, which corresponds to the first element of the array. Here’s the corrected code: using System; class Program { static void Main() { int[] numbers = { 1, 2, 3, 4, 5 }; int sum = 0; for (int i = 0; i < numbers.Length; i++) { sum += numbers[i]; } Console.WriteLine("The sum of the numbers is: " + sum); } } This should correctly sum the elements of the numbers array and output the result. Be aware that if you choose to submit your code directly to an AI chatbot to debug that you submit a snippet at a time (if possible). Otherwise the answer can get quite unwieldy! 3.7 Example Three: Comparing Syntax You may be in the middle of writing code when you realize that you’ve used two possible syntaxes for the same command. Perhaps you are using the R package here and are wondering if two different ways of writing a command are the same. Is here::here(\"raw\", \"data.csv\") equivalent to here::here(\"raw/data.csv\")? First, let’s see what ChatGPT responded. When we first started putting together this course, we had intended to include an answer from Bard that disagreed with the answer from ChatGPT as an example of why it can be helpful to query multiple chatbots. (We decided ChatGPT was correct after verifying its answer in the here package documentation.) However, when we queried Bard again two weeks later, we were pleasantly surprised to discover that Bard had corrected its answer. Now it’s a cool little example of how the chatbots are rapidly improving and “learning” from their mistakes. 3.8 Example Four: Figuring Out a Common Task Sometimes you have a task that needs done that is fairly routine. You might not have done it before, but plenty of people in the world have. In this case, you don’t want to reinvent the wheel, but you may not know exactly where to start. AI is a great option for figuring out the basic workflow steps. A very common task for anyone working with genomic samples is to combine multiple files in the vcf format. Perhaps you’ve been asked to do this and then create a separate file that contains the names of the samples. You turn to ChatGPT for help. Can you write a program that merges vcf files and then outputs the sample names? Great! Of course, maybe you’re not comfortable using bcftools, or you try this and can’t get it to work. You can take advantage of the “Regenerate Response” option in ChatGPT to get another solution. What’s another way I can do this? You can also specify some of the features you’d like in your code, or where you might want the code to come from. Maybe you want to practice using GitHub repositories, so you specifically request it. Can you write a program that merges vcf files and then outputs the sample names using a github repo? If you know you’ve used something like a GitHub repository, you can ask ChatGPT for the citation. Just be aware that you need to verify that any citation it gives you should be checked independently. Unfortunately, AI bots can still respond with imaginary journal articles. (Assume that if you can’t find it, it doesn’t exist.) What’s the citation for the repo used in this code? In general, after you find a solution that works for you, you should always make a habit of asking the AI where it got the code. Never, ever, ever run code without knowing where it came from. What’s the source for this code? You should always take care that you are only using code that you’re authorized to use when writing code. This means asking the AI chatbot where code snippets are from and tracking down those citations and authorizations yourself (instead of just trusting the AI that everything is okay). 3.9 Example Five: Planning Your Program Let’s say we want to create software that plays the card game blackjack. You can jump right in and ask your AI chatbot to write the code, but you’re unlikely to be successful and incredibly likely to get frustrated. Before you start interacting with the AI, you need to plan out what you want to do. Everyone needs a plan of attack when building software. You should start by identifying what your software will do, then you can figure out the steps necessary to make that happen. We’ll ask Claude-instant (as implemented by Poe) how we can go about creating the necessary code for our game. How would I go about writing a program that can play blackjack? That seems like a really good roadmap for building a basic game, but if you’re ambitious (and of course you are), you might want to give your game a little extra something. Claude-instant has some ideas for that, too. What are some other features I could add to this game? At this point, you are ready to begin writing your program. You have a lot of tasks to do, so you have to pick a logical starting point. If you’re uncertain, you can ask AI for advice on your first task. What should be my first coding task for this program? In this case, the AI went beyond just suggesting a first task and also offered several options for how to do it as well. 3.10 Example Six: Selecting a Coding Language You will need to choose the language for your software. If you don’t, AI will likely pick a language for you, which may or may not be one you want. You might choose a language based on: Your preexisting knowledge or experience with the language A specific use case where the language is commonly used The language(s) your colleagues or coworkers use You can find a description of other languages you might encounter here. Let’s ask Claude-instant again for some recommendations on the appropriate languages for our proposed blackjack game from the above example. (Pro tip: the various AI bots all keep track of your previous queries, so you don’t have to restate your goal for the software if you’re continuing the conversation.) What would be a good coding language to create this game of blackjack? Claude-instant actually gave us more information than we anticipated. If it hadn’t stated pros and cons while offering language suggestions, we could ask for them directly. This could be helpful if you’re deciding between two options like, say, Python or Java. What are the pros and cons of using Python vs Java for this project? 3.11 Hands On Exercise Now it’s your turn to try! 3.11.1 Code Here’s the code you need for question #2. def calculate_average(numbers): total = 0 for number in numbers: total += number average = total / len(numbers) return average numbers = [1, 2, 3, 4, "5"] average = calculate_average(numbers) print("The average of the numbers is:", average) 3.11.2 Questions Write a prompt to create a function that replaces all instances of the string “three” with the number 3, then create a prompt that allows you to modify the function so that “three”, “Three”, and “THREE” are all replaced with the number 3. Write a query that helps you debug the code snippet from above. Create a query that explains how to retrieve data from an online database. Don’t forget to find out what the source of the code is. Write a prompt to plan out a program that creates and plays a game of Rock, Paper, Scissors. You should create a multi-step plan as well as figure out what your first coding task should be. Create a prompt that identifies the best coding language for the above game. 3.12 Summary Coding has become an essential skill in today’s world. AI can increase the efficiency of planning, writing, and debugging code, as well as make coding more accessible for beginners. Querying AI on code writing tasks requires patience. AI cannot read your mind, and users need to be both specific and patient when asking for coding help. It is essential to test all AI-written code while in development, since inadequate testing could lead to low quality, buggy code. A programmer is responsible for what they create while coding with AI, even if AI writes the actual code. devtools::session_info() ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-14 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## curl 4.3 2019-12-02 [1] RSPM (R 4.0.3) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## highr 0.8 2019-03-20 [1] RSPM (R 4.0.3) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.3) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## ottrpal 1.0.1 2023-03-28 [1] Github (jhudsl/ottrpal@151e412) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["refactoring-code.html", "Chapter 4 Refactoring Code 4.1 Learning Objectives 4.2 Refactoring Basics 4.3 AI in Code Refactoring 4.4 Example One: Correcting Syntax 4.5 Example Two: Styling and Convention 4.6 Example Three: Staying DRY 4.7 Example Four: Keeping Code Alive 4.8 Example Five: Making Code Readable 4.9 Example Six: Keep it Brief 4.10 Code optimization 4.11 Challenges and Limitations 4.12 Hands-On Exercise 4.13 Summary", " Chapter 4 Refactoring Code 4.1 Learning Objectives Describe how refactoring code involves optimization for maintainability, efficiency, and reuse Explain why refactoring code is important for developers in the long-term Recognize the benefits and limitations of using AI tools to refactor code, as well as why AI tools are uniquely poised to be beneficial Implement prompt strategies that can be used to assist with refactoring code for correcting syntax, for adopting more consistent styling, for making code more concise, for making code easier to maintain, and for making code more efficient 4.2 Refactoring Basics Code refactoring is the process of improving the quality of underlying code without changing its functionality. In other words, it’s a way of cleaning up and optimizing code so that it’s easier to maintain and more efficient. This often involves making small changes to the code, such as renaming variables or functions, reorganizing code blocks, or simplifying complex expressions. Refactoring is an essential practice in software development and helps to ensure that the codebase remains manageable and adaptable as requirements and business needs change over time. Code refactoring helps to reduce technical debt, which is the accumulation of development work that needs to be done in the future as a result of taking shortcuts or using less than optimal solutions. Refactoring can help to identify potential issues and bugs before they cause problems, resulting in more stable software. Finally, refactoring code to be more readable and less verbose can create fewer headaches for you in the future. To understand why code refactoring is important, imagine trying to cook in a disorganized and cluttered kitchen. You might be a fantastic cook, but you are spending more time searching for utensils and ingredients, and the quality of the meal could suffer as a result. Similarly, code that is disorganized and cluttered can be difficult to work with and can result in lower-quality software. Refactoring is like cleaning and organizing the kitchen - it makes the process of cooking (or coding) smoother and more efficient. 4.3 AI in Code Refactoring Code refactoring has historically been done manually by developers. This involves reviewing code and identifying areas that could be improved or optimized, and then making changes to the codebase accordingly. Though important, this is process is time-consuming and labor-intensive, as it requires developers to carefully review every line of code to identify potential issues or areas for improvement. Additionally, manual code refactoring is error-prone, as developers can accidentally introduce bugs or errors into the codebase while making changes. However, AI has significant potential to help with code refactoring. AI can use machine learning algorithms to analyze large amounts of code and identify patterns or areas that could be improved. For example, they can identify sections of code that are redundant, overly complex, or difficult to maintain, and suggest changes that could be made to improve the codebase. Machine learning algorithms can also help to identify potential bugs or security issues in the codebase, which can help to improve the overall quality and stability of the software. AI refactoring is also faster and more accurate than manual refactoring. This is particularly useful for large-scale software projects with massive codebases, where manual code review and refactoring can be an enormous task. In the next sections, we’ll take a look at some examples of using AI to refactor code. 4.4 Example One: Correcting Syntax Programming languages operate using their established set of rules. For example, Python uses indentation to define code blocks. However, when you shift between development environments, indentation and line breaks can sometimes be affected, causing major headaches! Let’s say we run the following code: def calculate_sum(a,b): result = a + b return result def calculate_product(a,b):result = a * b return result def main(): a=5 b=6 sum_result = calculate_sum(a, b) print("Sum:", sum_result) product_result = calculate_product(a,b) print("Product:", product_result) if __name__ == '__main__': main() ## unexpected indent (<string>, line 3) We get the error above because our indentation is off. Let’s ask ChatGPT to clean things up. Can you assist me with correcting the formatting and indentation issues in my Python code? Of course! Here is the code after correcting the formatting and indentation: def calculate_sum(a, b): result = a + b return result def calculate_product(a, b): result = a * b return result def main(): a = 5 b = 6 sum_result = calculate_sum(a, b) print("Sum:", sum_result) product_result = calculate_product(a, b) print("Product:", product_result) if __name__ == '__main__': main() ## Sum: 11 ## Product: 30 Much better! This can be quite handy when you can’t immediately see what is wrong with the formatting or if you are working with a large codebase. 4.5 Example Two: Styling and Convention Sometimes programming languages have more flexible rules where there are multiple ways to take the same or almost the same action. For example, in the R programming language, there are several ways to perform variable assignment. However, is is generally accepted among R programmers that the <- operator should be used. Single and double quotes can be used for character types interchangeably, but it’s generally a good idea to stick with one or the other. 4.5.1 Commonly Accepted Convention Let’s say we have the following code: a <- "apple" b = 'banana' "carrot" -> c c(a, b, c) ## [1] "apple" "banana" "carrot" This code does not produce any errors, but it doesn’t follow typical R convention. Let’s ask ChatGPT to clean things up. Can you help me reformat my R code to follow accepted style conventions? Sure, here’s the cleaned-up R code: a <- "apple" b <- "banana" c <- "carrot" Much better! This code will now look more familiar to other R programmers. 4.5.2 Visual Styling Some languages are flexible about line breaks, indentation, and spacing among code blocks. If you’re rapidly developing, things can get messy pretty quickly. Many development environments, for example RStudio and PyCharm, have the ability to format your code for you without changing any of the code itself. However, you can also get AI to do this according to your own preferences. d <- c(rep(a,5),rep(b,5),rep(c,5),paste(a,b,c),paste(c,b,a)) This code is pretty hard to read. Let’s see if AI can do better. Can you space out this R code to make it more readable? Sure! Here’s the same code with added spacing for improved readability: d <- c(rep(a, 5), rep(b, 5), rep(c, 5), paste(a, b, c), paste(c, b, a)) Better! It’s easier to interpret because only one action is being taken on each step. By spacing out the code and breaking it into multiple lines, it becomes easier for the human eye to scan for structure, logic, errors, and/or inconsistencies. 4.6 Example Three: Staying DRY The DRY (Don’t Repeat Yourself) programming tenet is essential to good software. By avoiding repetition in code, we can reduce the amount of duplicate code we need to write and maintain. If we need to change the functionality of a certain piece of code, we have to make the same change in multiple places, which can be time-consuming and error-prone. Let’s take the following Python code: def print_greetings(): print("Hello, Alice!") print("Hello, Bob!") print("Hello, Carol!") print("Hello, David!") print("Hello, Eve!") print_greetings() ## Hello, Alice! ## Hello, Bob! ## Hello, Carol! ## Hello, David! ## Hello, Eve! Notice how repetitive this code is? print() and the string \"Hello, \" appear five times. We can improve this! Could you please refactor my Python code to make it more DRY? Sure, here’s a refactored version of your code: def print_greeting(name): print(f"Hello, {name}!") def print_greetings(): names = ["Alice", "Bob", "Carol", "David", "Eve"] for name in names: print_greeting(name) print_greetings() ## Hello, Alice! ## Hello, Bob! ## Hello, Carol! ## Hello, David! ## Hello, Eve! Now let’s imagine your software lead has asked us to change the greeting to Spanish. Notice how we only have to change the second line of code: def print_greeting(name): print(f"¡Hola, {name}!") def print_greetings(): names = ["Alice", "Bob", "Carol", "David", "Eve"] for name in names: print_greeting(name) print_greetings() ## ¡Hola, Alice! ## ¡Hola, Bob! ## ¡Hola, Carol! ## ¡Hola, David! ## ¡Hola, Eve! Much easier. 4.7 Example Four: Keeping Code Alive During software development, code is often changing rapidly. It’s common to end up with functions or variables you don’t need anymore. It’s best practice to remove “dead” code from your codebase: it’s distracting and can make maintenance more challenging. Programmers should resist the urge to hang on to unused code in the expectation that they might need it one day. Version control, such as provided by git and GitHub, let you keep a record of old code without cluttering your final version. Let’s take a look at some Python code with some dead code: def default_values(): a = 5 b = 6 return a, b def calculate_sum(a, b): result = a + b return result def calculate_product(x, y): return x * y def simulate_data(): a, b = default_values() dat = calculate_sum(a, b) + calculate_sum(a, b) return dat def main(): a = 5 b = 6 a, b = default_values() sum_result = calculate_sum(a, b) print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 Some of these functions aren’t used in the final step, typically defined in python as the main() function. Let’s ask AI to remove them. I need help with eliminating unused functions in my Python script. Sure! Here’s a refactored version: def default_values(): a = 5 b = 6 return a, b def calculate_sum(a, b): result = a + b return result def main(): a = 5 b = 6 a, b = default_values() sum_result = calculate_sum(a, b) print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 Great! Now our code is much more manageable. This code only includes the necessary functions needed to run the main() function. We can go a step further! Can you help me refactor my python code to make it more concise? Certainly! Here’s a more concise version of your code: def calculate_sum(a=5, b=6): return a + b def main(): sum_result = calculate_sum() print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 In this version, the calculate_sum function is simplified to take default arguments for a and b. This means the default_values function is no longer necessary. The a and b variables are also removed from main, as they are no longer needed. Much better and easier to look at overall. 4.8 Example Five: Making Code Readable Usually, our first goal as programmers is to get the response we need from the code. Readability might be the last thing on our minds. Luckily, AI can help with this. Let’s imagine we whipped up this code pretty quickly. It is tough to read, and it could definitely be improved. for(term in c("good","bad","ugly")){ print(c(rep(term,3),paste("really",stringr::str_to_upper(term)))) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" Let’s see if AI can do better. Can you help me clean up this R code so it is more readable? Sure! Here’s a more readable version of the code: library(stringr) for(term in c("good", "bad", "ugly")) { print(c(rep(term, 3), paste("really", str_to_upper(term)))) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" I added whitespace around operators and after commas for better readability. I also loaded the stringr package explicitly using library() to make the code more self-contained. This is a good start, but not much has changed. Let’s try again. On ChatGPT, we can click “Regenerate Response”. Here’s a cleaned-up version of the R code that is more readable: library(stringr) terms <- c("good", "bad", "ugly") for (term in terms) { rep_term <- rep(term, 3) upper_term <- str_to_upper(term) really_upper_term <- paste("really", upper_term) result <- c(rep_term, really_upper_term) print(result) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" I’ve stored the repeated term and the uppercased term in separate variables to make the code more readable. I’ve also separated the construction of the “really” phrase from the uppercased term to make it clearer what’s happening. Finally, I’ve used whitespace and indentation to make the structure of the code more apparent. Better! While the code takes up a greater number of lines, it’s easier to interpret because only one action is being taken on each step. 4.9 Example Six: Keep it Brief As we learned above, following the DRY principle and removing dead code can make your code easier to interpret, maintain, and debug. However, sometimes the issue is the functions themselves. No one has perfect knowledge of everything in their programming language. AI can help us determine if there are any more efficient sets of functions that do the same thing. Here is some code that plays with the “CO2” dataset in R: library(datasets) for (type in unique(CO2$Type)) { dat_subset <- CO2[CO2$Type == type, ] for (plant in unique(dat_subset$Plant)) { dat_subset_2 <- dat_subset[dat_subset$Plant == plant, ] print(paste(dat_subset_2$Plant[1], round(mean(dat_subset_2$uptake), 1))) } } ## [1] "Qn1 33.2" ## [1] "Qn2 35.2" ## [1] "Qn3 37.6" ## [1] "Qc1 30" ## [1] "Qc2 32.7" ## [1] "Qc3 32.6" ## [1] "Mn1 26.4" ## [1] "Mn2 27.3" ## [1] "Mn3 24.1" ## [1] "Mc1 18" ## [1] "Mc2 12.1" ## [1] "Mc3 17.3" Let’s see if AI has any ideas for making it more concise. Can you make this code more concise? Here’s a more concise version using dplyr package in R: library(dplyr) CO2 %>% group_by(Type, Plant) %>% summarize(mean_uptake = mean(uptake)) ## # A tibble: 12 × 3 ## # Groups: Type [2] ## Type Plant mean_uptake ## <fct> <ord> <dbl> ## 1 Quebec Qn1 33.2 ## 2 Quebec Qn2 35.2 ## 3 Quebec Qn3 37.6 ## 4 Quebec Qc1 30.0 ## 5 Quebec Qc3 32.6 ## 6 Quebec Qc2 32.7 ## 7 Mississippi Mn3 24.1 ## 8 Mississippi Mn2 27.3 ## 9 Mississippi Mn1 26.4 ## 10 Mississippi Mc2 12.1 ## 11 Mississippi Mc3 17.3 ## 12 Mississippi Mc1 18 This code groups the CO2 dataset by Type and Plant, calculates the mean uptake for each group using summarize, and then prints the resulting data frame. The code is shorter and more concise because it uses the dplyr package, which has built-in functions for data manipulation and summarization. Refactoring the code to be more concise using dplyr makes it easier to read and understand the steps involved in the data manipulation. It also reduces the number of nested loops and intermediate variables, making the code more efficient and reducing the chance of errors. Ultimately, it is your job as a programmer to use your best judgment on the fine balance between brevity and readability, as the best solution is not always this obvious. Remember that you can regenerate the AI’s response if you want more options. 4.10 Code optimization Sometimes time and efficiency is of the essence. Loops can be slow and time consuming in many languages. Take for example the following R code that calculates the cross product between two vectors x and y using nested for loops: set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) z <- numeric(n) # Start timer start_time <- proc.time() for(i in 1:n) { for(j in 1:n) { z[i] <- z[i] + x[i]*y[j] } } # End timer proc.time() - start_time ## user system elapsed ## 12.208 0.000 12.208 What is a faster, more optimized way of running the following R code? set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) # Start timer start_time <- proc.time() z <- outer(x, y, "*") z <- rowSums(z) # End timer proc.time() - start_time ## user system elapsed ## 0.575 0.327 0.598 The outer() function performs the same calculation as the nested loop in the original code, but more efficiently. It returns a matrix of all possible combinations of x and y values, with each element of the matrix being the product of the corresponding x and y values. The rowSums() function is then used to sum the elements of each row of the matrix, which is equivalent to summing the products of x and y for each index i in the original loop. This method avoids the need for the nested loop, resulting in a faster and more efficient computation. So much faster! We can look at another option by regenerating the response: set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) z <- numeric(n) # Start timer start_time <- proc.time() z <- x %*% t(y) # End timer proc.time() - start_time ## user system elapsed ## 0.265 0.288 0.350 One optimized way to perform the same calculation is by using the %*% operator to perform matrix multiplication. This can be done by converting x and y to matrices and transposing one of them so that their dimensions align for matrix multiplication. This code should be much faster than the original implementation because it takes advantage of highly optimized matrix multiplication algorithms in R. While this second suggestion is faster, you will need to consider what aspects of the codebase are most important in each instance. For example, this code runs more quickly, but the %*% operator might be unfamiliar to some R programmers. In cases where efficiency is less important, or the data are not large, you might consider maximizing readability. 4.11 Challenges and Limitations Like humans, AI can make mistakes. Be sure to keep the following in mind as you use AI to refactor your code. 4.11.1 Complexity Refactoring is often a complex process that requires deep understanding of the code and its context. AI may not always be able to fully understand the complexity of the code and may struggle to identify the best refactoring strategy. Use a modular approach whenever possible. 4.11.2 Limited Data AI models require large amounts of data to learn from, but in the case of code refactoring, there is often limited data available. This can make it difficult for AI models to generalize to new code bases and situations, especially if you are using a more niche programming language. 4.11.3 Quality Control Automated refactoring tools that use AI may not always produce code that is of the same quality as code produced by human developers. It can be difficult to always ensure that the refactored code is maintainable, efficient, and free of bugs. You need to use your best judgment when copying and pasting AI-produced code into your codebase. You should always include unit tests in your code. Tests can help you catch bugs, including those introduced accidentally by AI. Because AI models are created by humans, they can be biased. This means they may not always identify your preferred refactorings or may prioritize certain types of refactorings over others. In some cases, this can lead to suboptimal code quality and may create technical debt over time. 4.11.4 Security When using AI to refactor code, the code itself is often sent to an external service or platform for analysis and transformation. This can raise concerns about the security of the code, especially if it contains sensitive information such as trade secrets, proprietary algorithms, or personal data. If your code is sensitive, it’s important to carefully vet any third-party AI tools or services used in the refactoring process. 4.12 Hands-On Exercise Now it’s your turn to try. 4.12.1 The Code Let’s say you are dusting off some code from your past (no judgment here). You were investigating tweets about Mr. Trash Wheel, a beloved Baltimore-based contraption that filters trash out of the waterways. Note: This code is just an example and was written strictly for educational purposes. import tweepy import pandas # Enter your API keys and access tokens here consumer_key = 'your_consumer_key' consumer_secret = "your_consumer_secret" access_token = 'your_access_token' access_token_secret = 'your_access_token_secret' # Authenticate with Twitter API auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q="Mr. Trash Wheel", tweet_mode='extended').items(100) # Create empty list to store tweet data date_data = [] location_data = [] text_data = [] def get_tweet_length(tweet): # Return the length of the tweet text return len(tweet.full_text) # Loop through each tweet and extract desired data for tweet in tweets: date_info = { 'date': tweet.created_at } date_data.append(date_info) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q="Mr. Trash Wheel", tweet_mode='extended').items(100) # Loop through each tweet and extract desired data for tweet in tweets: location_info = { 'location': tweet.user.location } location_data.append(location_info) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q='Mr. Trash Wheel', tweet_mode='extended').items(100) # Loop through each tweet and extract desired data for tweet in tweets: text_info = { 'text': tweet.full_text } text_data.append(text_info) # Combine lists into a dictionary data = {'date': date_data, 'location': location_data, 'text': text_data} # Store results in pandas dataframe df = pandas.DataFrame(data) # Print dataframe print(df) 4.12.2 Questions Create an AI prompt that fixes any formatting issues with the code that would cause it not to run. Devise an AI prompt that removes any dead code from your sample above. What gets removed? Create a prompt that makes the code less repetitive, adhering to the DRY principle. What aspect of the code was repetitive? Construct a prompt that makes the code more concise. What are some trade-offs that appear in this code between readability and brevity? 4.13 Summary Code refactoring is the process of improving code quality without changing its functionality. It is crucial in software development to maintain a manageable and adaptable codebase. Code refactoring reduces technical debt, improves code stability, and makes it easier to maintain. Examples of using AI for code refactoring include correcting syntax, adhering to styling and convention, visual styling, avoiding repetition, removing dead code, and improving both readability and speed of execution (optimization). The use of AI for code refactoring raises ethical concerns and is not perfect. It is important for the developer to consider security needs of their code, as well as test out their code. "],["annotating-your-code.html", "Chapter 5 Annotating Your Code 5.1 Learning Objectives 5.2 Annotating Code is Good Practice 5.3 Improves readability 5.4 Improves maintainability 5.5 Improves the quality 5.6 Examples of quality aiding comments 5.7 AI in Code Annotation 5.8 Ethics of using AI to annotate code 5.9 Examples on using AI for annotation 5.10 Summary", " Chapter 5 Annotating Your Code 5.1 Learning Objectives Explain the significance and benefits of code annotation and how it aids in understanding and working with code in the future. Demonstrate the ability to apply effective code annotation practices by providing clear and concise comments in code. Recognize the impact of code annotation on code maintainability. Explain how well-annotated code can facilitate collaboration among developers and ensure code consistency. Critically evaluate the use of AI in code annotation, including benefits, ethical considerations, and limitations. Explore how AI can be used to create README files, do line-by-line annotations, and offer potential code improvements. 5.2 Annotating Code is Good Practice Code annotation, also known as code commenting, is the process of adding explanatory notes to source code. These notes are used to provide context, clarify functionality, and aid in understanding for developers who may be working on the codebase in the future. Code annotation is an important practice for any developer looking to write clear, efficient, and maintainable code. There are many benefits to annotating code: Improves readability Improves maintainability Improves quality 5.3 Improves readability First, it can help to improve the readability of code. By adding comments, developers can explain the purpose of each section of code, which can make it easier for others to understand what the code is doing. This can be especially helpful for large or complex codebases, where it can be difficult to keep track of all the different components. This is particularly important in contexts where the original developer of a project may move on to something else and others are left to work on them. Annotation perhaps most often helps a developer remember things they knew about the code when they were originally writing it (annotation is helpful for future you!) If the original developer left well-annotated code, it can drastically improve the ability of others (and their future selves) to continue with the project. 5.3.1 Examples of readability aiding comments Comments that help readability clarify what the code is doing Function explanations: A comment at the beginning of a function or method can describe its purpose, parameters, and expected return values. This makes it easier for others to understand what the function does and how to use it. # This function calculates and returns the sum of two numbers (x and y) def add_numbers(x, y): return x + y Inline explanations: Use inline comments to explain what a specific line or block of code does. This can help someone reading your code to quickly understand what’s going on. # Loop through each item in the list for item in my_list: # Check if the item is greater than 10 if item > 10: # Print the item print(item) TODO comments: Use TODO comments to mark areas of your code that need further work or improvement. This can help you or others to remember to come back to a specific part of the code. # TODO: Implement error handling for this function def my_function(): pass Parameter descriptions: If a function or method has complex parameters, it can be helpful to add comments explaining what each parameter does. def my_function(parameter1, parameter2): """ Calculate the sum of two numbers. Parameters: parameter1 (int): The first number to be added. parameter2 (int): The second number to be added. Returns: int: The sum of parameter1 and parameter2. """ return parameter1 + parameter2 Code block summary explanations: If you have a long or complex code block, you can add a comment to explain what the block is doing. # This code block creates a dictionary containing the counts of each word in a list of text strings word_counts = {} for text in text_list: for word in text.split(): if word not in word_counts: word_counts[word] = 0 word_counts[word] += 1 5.4 Improves maintainability Code annotation can help to improve the maintainability of code. By adding comments, developers can explain the reasoning behind certain decisions, which can make it easier for others to make changes to the code without breaking it. This can be especially helpful when multiple developers are working on the same codebase, as it can help to prevent conflicts and ensure that the code is always in a consistent state. 5.4.1 Examples of maintainability aiding comments Comments that help maintainability explain the historical context of why code was made the way it was Examples of maintainability aiding comments Design decisions: If there were specific design decisions made when creating the code, you can add comments explaining why certain choices were made. bash # We chose to use a linked list data structure for this function to reduce the time complexity of inserting and deleting elements. Legacy code: Sometimes, code may have been written in a certain way due to constraints or limitations at the time it was created. Adding comments to explain this can help others understand why the code is the way it is. # This code was written before Python 3, which introduced the `yield from` syntax. Therefore, we used a `for` loop to iterate over the nested list. Performance optimizations: If certain performance optimizations were made to the code, you can add comments explaining why they were necessary. bash # We used memoization to improve the time complexity of this recursive function, as it was taking too long to execute for larger inputs. Compatibility considerations: If the code was written with compatibility considerations in mind, you can add comments explaining why certain choices were made. # We used the `os.path` module to ensure that this code will work on both Windows and Unix-based systems, as the path separators are different on each platform. Limitations: If there are limitations or edge cases that the code cannot handle, you can add comments to explain this to others. # Note that this function assumes that the input array is sorted in ascending order. If the array is unsorted, the results may be incorrect. 5.5 Improves the quality Code annotation can help to improve the quality of code. By adding comments, developers can identify potential issues or edge cases, which can help to prevent these issues from occurring in the first place. This can be especially helpful when testing code, as it can help to ensure that the code is working as expected. 5.6 Examples of quality aiding comments Comments that help improve the quality of the code by explaining to others how to use it or help improve it Error handling: Adding comments to explain how and why error handling is being implemented can help ensure that your code is robust and able to handle unexpected inputs or errors. This helps others know how the code was intended to be used. # If the input argument is not a list or is empty, raise a ValueError if not isinstance(input_list, list) or len(input_list) == 0: raise ValueError("Input must be a list that is not empty") Complexity: If your code has particular complexities, adding comments that explain it can help others understand the performance characteristics of your code. It may help others identify whether there is a simpler way to write the code. By adding a comment that expresses uncertainty about the code and asking for suggestions, the author can potentially receive feedback from others on how to improve the code. # This block of code could probably be simplified, but I'm not sure how. new_list = [] for i in old_list: if i > 0: new_list.append(i) Constants and variables: Adding comments to explain the purpose of constants and variables can make your code easier to use. It not only notifies others of the variables existence but lets them know if they need to change the parameters for their own purposes. # This constant represents the maximum allowed number of retries when attempting to connect to the server. MAX_RETRIES = 3 # This variable tracks the number of failed attempts to connect to the server. num_retries = 0 Code organization or disorganization: Adding comments to explain the organization and structure of your code can make it easier for others to navigate and may bring out a better organizational strategy for your code. # Define helper functions for data processing def preprocess_data(input_data): # Implementation details def analyze_data(processed_data): # Implementation details def visualize_data(analyzed_data): # Implementation details # Load data from file input_data = load_data("input.txt") # Preprocess data processed_data = preprocess_data(input_data) # Analyze data analyzed_data = analyze_data(processed_data) # Visualize data visualize_data(analyzed_data) Assumptions and constraints: Adding comments to explain the assumptions and constraints of your code can help others understand its limitations and potential pitfalls. # This function assumes that all input values are positive integers. Negative or non-integer inputs may result in unexpected behavior. 5.7 AI in Code Annotation We’ve discussed why code annotation and documentation are important, but when and why would you use AI for code annotation? AI can be a nice tool to supplement the annotation of your code. It should not be the only source of annotation for your code, because as the code author, you need to verify that what AI has annotated is consistent with your knowledge and history of development of the code. But using AI to annotate your code can be useful to supplement existing code annotations or to annotate old code that was poorly annotated either by yourself or others who are no longer working on the project. Here are some of the benefits of using AI for code annotation: Speed and efficiency: AI algorithms can analyze code much faster than humans, which means they can quickly generate comments and annotations for large codebases. This can save developers significant time and effort, allowing them to focus on other aspects of the development process. Consistency: Unlike humans, AI is not affected by personal biases or preferences, so it can provide consistent annotations across different code files and projects. A human may underestimate places in the code that should have annotation, whereas an AI might be more consistent at putting annotation in these places. This can help ensure that all code in a project is well-documented and easy to understand. Objectivity: AI can analyze code objectively and identify potential issues that may have been missed by humans. This can lead to better quality code that is easier to maintain and less prone to errors. Learning: AI models can learn from large datasets of annotated code and improve their ability to generate comments and annotations over time. This means that the more code the AI model analyzes, the more accurate and effective it becomes at annotating code. Specificity: AI models can be trained on specific programming languages, frameworks, or libraries, allowing them to generate language-specific comments and annotations that are tailored to the needs of the project. This can be particularly helpful for large, complex projects that require specialized knowledge or expertise. 5.8 Ethics of using AI to annotate code However, there are also a number of potential ethical concerns associated with using AI to annotate code. For example, we don’t yet know how and in what ways AI models may be biased. Additionally, AI models may be opaque, which could make it difficult to understand why they made certain annotations. Finally, because AI models are not humans and don’t necessarily tell the stories behind the code, they may be used to generate annotations that are not accurate, helpful, or do not tell the full depth of the history of what occurred with the code, which could lead to problems. Given these potential benefits and concerns, it is important for users of AI to always realize that they are ultimately accountable for the annotation that an AI model makes, and careful review of this annotation is needed. There are a number of ethical considerations to take into account when using AI to annotate code. Here are a few of the most important considerations: Accountability: First and foremost, a user of AI is always primarily accountable for any output that they use from an AI model. AI models can give you annotation to start with, but it is up to you as the user to verify and review this output carefully. The user of the AI model is ultimately responsible for keeping or throwing out the annotations the AI makes and thus is responsible for using errors that the model makes. Much like a user of a Google Search engine is responsible for which results they use, a user of AI is responsible what output from the AI model they use. Transparency: It is important to be transparent about the use of AI in code annotation. This means disclosing the fact that AI is being used, as well as the specific AI model and dataset that is being used. This should be stated on every file where annotation has been made using AI. This transparency allows others who view the code to be able more fully interpret the AI-created annotation that accompanies the code. Bias: AI models are trained on data, and this data can introduce bias into the model. It is important to be aware of the potential for bias and to take steps to mitigate it. This can be done by using a diverse dataset, by carefully selecting the features that are used to train the model, and by using techniques such as adversarial training. It is important to provide annotations that indicate any known biases, possible limitations associated with bias, and any strategies that were used to mitigate bias. Explainability: It is important to be able to explain the decisions that an AI model makes. This is especially important in the case of code annotation, where the decisions made by the AI model can have a significant impact on the quality of the code. There are a number of techniques that can be used to explain the decisions of an AI model, such as visualization and rule extraction. You can (and should) ask a chatbot to explain its sources and rationale for output that it gave. (Be aware that actual citations it gives may or may not be accurate, and you need to verify the accuracy of those citations by doing your own follow up literature search). 5.9 Examples on using AI for annotation 5.9.1 Ask for a README summary To try out some annotation with code examples, this author dug up some old code that she had no idea about what it did anymore because time had passed and she didn’t document it very well. First, I wanted ChatGPT to make a README for me. README’s are a standard file that explain what the code base does and provides a summary of what is happening in the set. I asked ChatGPT if it could: Create a README for this code and then I pasted the old code. ChatGPT gave me some output that generally looked like a README: This README seems accurate given the work I was doing at the time. But obviously the title isn’t great – it doesn’t describe the code well. Plus there’s not really information about usage or outputs of the code. So I asked it: Can you make this README more succinct and with sections about how you would use this code? This yielded a much better result. 5.9.2 Ask for line by line annotation Second, I wanted to have ChatGPT annotate code line by line. So I asked it Can you annotate some R code for me? It not only provided helpful annotations but it fixed spacing and added more spaces between lines of code: It does however, think the code is Python apparently if we look in the upper left corner. But this doesn’t seem to have affected its accuracy in its annotations. 5.9.3 Ask for how the code could be improved This code is old, and I don’t think it runs properly and is in no way reproducible. So I wanted to see what ChatGPT would recommend for improvements to the code. I said How could I make this code better? It had a lot of great advice: First and foremost it told me my variable names are not consistent and also not descriptive. Which is completely accurate! There’s variable names like xx and ll which don’t tell the reader anything about the variable. 5.9.4 Style the code Next, I asked chatGPT if it could restyle the code into tidyverse style. It definitely put elements that look like tidyverse however, it also put items in the code that I can tell by looking at it won’t work. So I tried to tell it about the errors: :::{.query} This code is close but there are some errors. These tidyverse functions don’t work with matrices. ::: and it told me: You are correct that the tidyverse functions are designed to work with data frames and not matrices. Here's a modified version of your code that converts the matrix to a data frame and uses tidyverse functions: The lesson here is that the output of chatGPT still needs to be vetted by the person asking for it. It is a great idea to continue to have AI’s work on something if it isn’t quite what you are asking for, however, at the end of the day it is you, the human, who has to vet the output. 5.9.5 Sometimes it has trouble with file types if given a URL Copying and pasting code appears to be the best way to go at this point in time. I asked chatGPT if it could annotate code in a Jupyter notebook at a URL I gave it: Can you annotate the code at this URL? https://github.com/jhudsl/reproducible-python-example/blob/main/make_heatmap.ipynb It responded with some of the code at that file where it was annotated but most of the notebook was not there. I told ChatGPT: That isn’t all the code at that url ChatGPT told me: You're right, I apologize. The code at the given URL is actually in a Jupyter Notebook format, so it's not easy to copy-paste here. However, I can suggest an alternative approach: you can use the website nbviewer (https://nbviewer.jupyter.org/) to view the notebook and its code. Here are the steps to do so: Go to https://nbviewer.jupyter.org/ Paste the URL of the notebook https://github.com/jhudsl/reproducible-python-example/blob/main/make_heatmap.ipynb into the "Enter the location of a Jupyter Notebook" field and press "Go". This will display the notebook and its code, and you can read through it and add your own annotations as needed. I hope this helps! This was a nice tool to recommend, however it still told me I should add my own annotations at the end of this. 5.10 Summary Code annotation is important for readability, maintainability, and overall quality of code AI can be a helpful tool for supplementing annotations or creating annotation for poorly annotated code. Ultimately the code author who uses AI as a tool for annotation is still responsible for verifying that the annotations are consistent with their knowledge of the code. devtools::session_info() ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-14 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## curl 4.3 2019-12-02 [1] RSPM (R 4.0.3) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## highr 0.8 2019-03-20 [1] RSPM (R 4.0.3) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.3) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## ottrpal 1.0.1 2023-03-28 [1] Github (jhudsl/ottrpal@151e412) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["understanding-unfamiliar-code.html", "Chapter 6 Understanding Unfamiliar Code 6.1 Learning Objectives 6.2 Reading Unfamiliar Code Is A Skill 6.3 AI in Reading Code 6.4 Example One: Summarizing Code You Didn’t Write 6.5 Example Two: Identifying a Coding Language 6.6 Example Three: Interpreting Regex 6.7 Example Four: Demystifying Functions 6.8 Limitations 6.9 Hands-On Exercise 6.10 Summary", " Chapter 6 Understanding Unfamiliar Code 6.1 Learning Objectives Recognize the benefits, limitations, and assumptions in using AI to understand unfamiliar code Consider the ethical questions around using AI to understand another person’s code Practice using AI to summarize code, identify an unknown coding language, and interpret regex and unknown functions 6.2 Reading Unfamiliar Code Is A Skill As a programmer, you will frequently encounter situations where you need to read and understand code written by other developers. This could be because you are collaborating on a project, you have taken over maintenance of an existing codebase, or you are learning new syntax or skills that require reading example codes. Comprehending code written by someone else is a distinct skill from being able to write your own code, and it takes practice to become proficient at it. Reading unfamiliar code is like exploring an unfamiliar city without a map or a guide. Just like in a new city, you may not know where to start or how to navigate the codebase. You may encounter unfamiliar syntax, functions, and libraries that you have never seen before, just as you might encounter new streets, buildings, and landmarks. At first, you may feel disoriented and overwhelmed, and may need to spend some time getting oriented and familiarizing yourself with the environment. As you explore, you may start to see patterns and similarities, just as you might begin to recognize neighborhoods and landmarks in a new city. You may also encounter dead-ends, confusing intersections, and unexpected detours, just as you might encounter bugs and errors in the code. To make progress in this unfamiliar territory, you may need to rely on a combination of intuition, deduction, and experimentation. You may need to break down the code into smaller pieces, analyze the behavior of each piece, and gradually build up a mental model of how the code works. You may also need to consult documentation, online resources, or other experts who are familiar with the codebase, just as you might ask locals or consult a map in a new city. In the end, with patience, persistence, and a willingness to learn, you can successfully navigate the unfamiliar code and achieve your goals, just as you can successfully explore a new city and discover its hidden treasures. Learning how to read and understand code written by someone else is a valuable skill that can help you be a more effective programmer. By being able to read and comprehend unfamiliar code, you can contribute to projects more quickly, diagnose and fix bugs more effectively, and learn new programming techniques and best practices. As with any skill, the more you practice reading and understanding code written by others, the more comfortable and proficient you will become at it. 6.3 AI in Reading Code While anyone can learn to understand code written by someone else, using AI language models to achieve this has three significant advantages. Summarize large codebases. AI can quickly scan large codebases and give you a high-level overview of what they do. This can save you a lot of time and effort, especially when dealing with complex codebases. For example, AI can identify the main functions and classes in a codebase, and it can show you how they are related. Analyze the code. AI can look at the code and tell you about its structure, complexity, and potential defects quickly and efficiently. This can help you understand the code better and make it easier to maintain and improve. For example, AI can identify parts of the code that are difficult to understand or maintain, and it can suggest ways to improve the code. Understand comments and documentation. AI can read comments and documentation written by other developers. This can help you understand the code’s purpose and the assumptions and limitations of the codebase. For example, AI can identify the key assumptions that are made in the code, and it can explain the trade-offs that were made in the design of the code. Keep these ethical considerations in mind when using AI to summarize code. Who owns the code? If you are not the owner of the code, you should get permission from the owner before summarizing it. What is the purpose of the summary? If you are summarizing the code for educational or research purposes, it is likely ethical to do so. However, if you are summarizing the code for commercial purposes, you may need to get permission from the owner. Does the code contain any sensitive or personal information? If the code contains any sensitive or personal information, you should take special care to protect that information. 6.4 Example One: Summarizing Code You Didn’t Write Here’s some simple code using the R programming language, written by an unnamed programmer. If you are familiar with R, you might be able to understand what this code does with little effort. However, if you don’t know R, you’re relatively new to R, or you’re just having a bad day, understanding what this bit of code can take a great deal more energy and time. AI language bots can ease that burden. library(tidyverse) d <- ggplot2::diamonds d_sel <- d %>% select(carat, cut, color, clarity, depth, table, price) d_fil <- d_sel %>% filter(price > 5000) d_bin <- d_fil %>% mutate(carat_bins = cut_width(carat, width = 0.2)) d_summ <- d_bin %>% group_by(carat_bins, cut, color, clarity) %>% summarize(mean_price = mean(price), sd_price = sd(price)) We first ask Claude-instant (as implemented by Poe) to explain the purpose of this code. What does this code do? You can ask for Claude to summarize each step in detail as well. Can you explain what each step does in detail? Maybe you’ve never worked with the tidyverse package in R - Claude can tell you about it. What is the tidyverse package? You might also not be familiar with the diamonds dataset (especially if you’ve never used it before) and want some clarification on where the data are from. Can you tell me more about the diamonds dataset? 6.5 Example Two: Identifying a Coding Language Sometimes you might have to work with legacy code. Legacy code can be difficult to work with, especially if it is written in a language or style with which you are not familiar. There are more than 700 programming languages in use today, so it is impossible for any programmer to know them all. AI can be a helpful tool for identifying the language and version of legacy code, which can make your life just a little easier. Let’s look at example code that might have been written decades ago. program temperature_smog_analysis; uses Math; const n_temperatures = 1000; n_smog_measures = 500; type TemperatureArray = array[0..n_temperatures-1] of integer; SmogArray = array[0..n_smog_measures-1] of integer; var temperatures: TemperatureArray; smog_measures: SmogArray; combined_data: array of integer; slope, y_intercept: double; i, j: integer; function connect_to_database(filename: string): integer; begin { implementation of connect_to_database function } end; function retrieve_temperatures(fd: integer): integer; begin { implementation of retrieve_temperatures function } end; function retrieve_smog_measures(fd: integer): integer; begin { implementation of retrieve_smog_measures function } end; function retrieve_temperature(fd, index: integer): integer; begin { implementation of retrieve_temperature function } end; function retrieve_smog_measure(fd, index: integer): integer; begin { implementation of retrieve_smog_measure function } end; procedure close_database(fd: integer); begin { implementation of close_database function } end; We can ask Bard to take a guess at what the coding language might be. What language is this code written in? We can explore further to figure out what indicators tell Bard the language is Pascal. How do you know this is Pascal? It turns out there are multiple versions of Pascal. Since you will presumably need to work with this legacy code, you might want to know which version it is. What version of Pascal is this? 6.6 Example Three: Interpreting Regex Regex (short for Regular Expressions) is a powerful tool used to describe patterns in text. It’s a sequence of characters that define a search pattern. Regular expressions can be a powerful tool for data cleaning, text mining, and data validation. They are widely used in web development, data science, and other fields where text processing is important. They can also be tricky to understand at first because they involve a specific syntax that can be complex. We can use ChatGPT to explain regex in a bit of someone else’s code. x = sub("/d/e","/d",x) What does this do? 6.7 Example Four: Demystifying Functions Sometimes we get handed code that includes complex architecture. Perhaps you are working with code you wrote years ago when you really liked loops, but present you finds them difficult to parse. (Be nice to past you - maybe you were a baby programmer and didn’t know better.) Instead of giving yourself a headache, you could turn to ChatGPT to explain what your old code does. def my_function(x): result = x for i in range(10): for j in range(5): result = result + 2 * (i + 1) * (j + 1) * (i % 2 == 0 and j % 2 == 0) - 1 return result What does this function do? Well, now you have an idea what past-you might have been attempting to do with this code. You can also query AI about the potential problems from using the code as-written. What are some possible issues with this code? (And if you do ever run across an expression like this in your code, you can check out our chapter on refactoring code using AI to fix it!) 6.8 Limitations Although chatbots can be extremely helpful for better understanding someone else’s code, if there is a lack of annotation or documentation, it will be difficult to discern more about the context of the development. Cases where developers may not have provided context information include: Not stating their purpose for writing this code. Not describing what they planned to do next. Not describing (or being aware of) possible parts of the code that need updating or maybe have security or privacy issues. Not describing how they made certain decisions in the development process. Although we can’t truly understand some of this information, it is however possible to get some assistance from chatbots with prompts such as: What do you think the purpose of this code is? What might be possible next steps to further develop this code? What are possible issues with this code, particularly for security or privacy? Why was the code possibly written with this structure? What other options are possible? Chatbot tools are also limited in terms of how up-to-date their training data is to know about current possible issues with code. 6.9 Hands-On Exercise Now it’s your turn to try. Let’s say you were handed this bit of code to work with (and eventually modify), but first you need to figure out what’s going on with it. Unfortunately, the original programmer left very little in the way of notes or annotation. 6.9.1 The Code Note: This code is just an example and was written strictly for educational purposes. use ode_solvers::{Euler, OdeMethod}; use plotters::prelude::*; use std::fs::File; const N: f64 = 1000000.0; const I0: f64 = 10.0; const R0: f64 = 0.0; const S0: f64 = N - I0 - R0; const BETA: f64 = 0.3; const GAMMA: f64 = 0.1; fn sir_model(t: f64, y: &[f64], v: f64) -> Vec<f64> { let s = y[0]; let i = y[1]; let r = y[2]; let dsdt = -BETA * s * i / N - v * s; let didt = BETA * s * i / N - GAMMA * i; let drdt = GAMMA * i + v * s; vec![dsdt, didt, drdt] } fn main() { let mut solver = Euler::new(sir_model); solver.set_initial_condition(&[S0, I0, R0]); let root = BitMapBackend::new("sir_vaccination.png", (800, 600)).into_drawing_area(); root.fill(&WHITE).unwrap(); let mut chart = ChartBuilder::on(&root) .caption("SIR model with vaccination", ("sans-serif", 40)) .set_label_area_size(LabelAreaPosition::Left, 60) .set_label_area_size(LabelAreaPosition::Bottom, 40) .build_cartesian_2d(0.0..100.0, 0.0..N) .unwrap(); let mut data1 = vec![]; for _ in 0..1000 { let y = solver.integrate(0.1, 0.0); data1.push((solver.time(), y[0], y[1], y[2])); } let vac_rate = 0.1; let vac_coverage = 0.5; let vac_num = vac_coverage * N; let mut data2 = vec![]; for i in 0..1000 { let t = i as f64 * 0.1; let v = if t >= 50.0 && t < 150.0 {vac_num * vac_rate} else {0.0}; let y = solver.integrate(0.1, v); data2.push((solver.time(), y[0], y[1], y[2])); } chart .configure_mesh() .x_labels(10) .y_labels(10) .disable_x_mesh() .disable_y_mesh() .draw() .unwrap(); chart .draw_series(LineSeries::new( data1.iter().map(|d| 6.9.2 Questions Write a query that identifies the features that can be used to determine the coding language, as well as the version of the language. What language and version is it written in? What features give away the coding language? Create a prompt that determines what this code does, as well as important features of the program. What does this code do? Write a prompt that determines the purpose of this specific code snippet: let root = BitMapBackend::new("sir_vaccination.png", (800, 600)).into_drawing_area(); What is the purpose of this line of code? 6.10 Summary Understanding code written by someone else can be made easier by using AI AI chatbots can summarize the code and the dataset used, as well as clarify functions and expressions that are confusing AI chatbots can also identify the coding language and version used in legacy code Be mindful of who owns the code and what the summary will be used for when using AI to summarize someone else’s code "],["ai-for-bioinformatics.html", "Chapter 7 AI for Bioinformatics 7.1 Learning Objectives 7.2 AI in Bioinformatics 7.3 Using these topics for Bioinformatics 7.4 Hands-On Exercise 7.5 Summary", " Chapter 7 AI for Bioinformatics 7.1 Learning Objectives Understand the potential applications of AI assistance in bioinformatics software development Identify the challenges and considerations when using AI in bioinformatics software development Grasp the serious ethical concerns when designing bioinformatics software that deals with sensitive data 7.2 AI in Bioinformatics Bioinformatics is an interdisciplinary field that combines biology, computer science, statistics, and data analysis to extract insights from biological data. It involves the development and application of computational tools, algorithms, and databases to store, organize, analyze, and interpret vast amounts of biological information, particularly in the context of genomics and public health. Biomedical imaging and electronic health record informatics are also rapidly growing subdisciplines. Bioinformatics is especially important for research and work done with genomes. In genomics, bioinformatics enables the identification and annotation of genes, the prediction of protein structures and functions, and the analysis of genetic variations, such as single nucleotide polymorphisms and gene copy number variations. These analyses help in unraveling the genetic basis of diseases, discovering potential drug targets, and facilitating personalized medicine approaches. The proliferation of genomic data in recent years have been transformative for the field of genomics. Advances in DNA sequencing technologies, particularly next-generation sequencing, have revolutionized the ability to generate vast amounts of genomic data quickly and at a reduced cost. This has led to an unprecedented increase in the availability and accessibility of genomic information. Researchers estimate that we will generate between 2 and 40 exabytes of genomic data within the next decade. Cloud computing platforms like AnVIL are on the forefront of addressing the big genomic data storage and processing challenge. You have seen principles in previous chapters that likewise apply to creating bioinformatics software. Creating bioinformatics software, however, comes with its own unique sets of challenges: Data Complexity: Genomic data is vast and complex, often consisting of large datasets with multiple dimensions, such as nucleotide sequences and quality scores, gene expression levels, and epigenetic modifications. Developing algorithms and software tools to efficiently handle and analyze this data requires expertise in data management, processing, and interpretation as well as the biological domain knowledge. Diverse Data Types: Bioinformatics software often handles diverse and complex data outside of strings and numeric types, such as Bioconductor’s Single Cell Experiment data class. Knowledge of common data types used in bioinformatics is required to create functional software. Algorithmic Complexity: Developing accurate and efficient algorithms for various bioinformatics tasks, such as sequence alignment, gene prediction, and protein structure prediction, requires deep understanding of biological principles and computational techniques. Designing algorithms that can handle noise, errors, and biological variations is a persistent challenge. Integration with Existing Workflows: Bioinformatics software is often part of a larger analysis pipeline or workflow. Integrating AI-based tools seamlessly into existing bioinformatics pipelines and workflows, ensuring compatibility, and maintaining data integrity throughout the process requires special attention. Ethical Considerations: People are typically more sensitive about their health data when compared to their Amazon purchases. This is for a good reason: there is a painful history of the medical establishment performing experiments on people with less socio-political power. Like medical record data, genomic data is supposed to be carefully protected so that it can’t be used to disadvantage certain individuals. AI tools on the other hand, are not typically designed with security and data sovereignty in mind. 7.3 Using these topics for Bioinformatics When using AI to help generate code examples, perform refactoring, understand code, or create unit tests, you should keep the following in mind: Writing Code: AI can be used to automatically generate bioinformatics code snippets or functions. It can provide code suggestions, auto-completion, or even generate code based on given specifications or data classes. This can include creating some unit tests for your functions to ensure they are working as expected. Refactoring Code: AI can aid in refactoring bioinformatics code to improve readability, maintainability, and performance through suggestions for code restructuring. Annotating Code: Bioinformatics tools can be used for clinical decision making. It’s important that anyone vetting or auditing your tool can trace any decision making steps. Therefore, clear understanding and documentation are key. AI can assist in drafting documentation and code annotations, such as through summarizing code functionality, clarifying complex expressions or functions, and/or providing plain-terms explanations of the underlying algorithms. Understanding Code: Although bioinformatics is rapidly evolving, its often specific use cases mean that sometimes developers have to work with code that is quite old or outdated. AI can help identify the coding language and version used in legacy bioinformatics code. This can be valuable when working with older software or when integrating different codebases. NEVER submit chunks of code to AI tools without being 100% sure that they are free from personally identifiable information (PII). PII includes variables like gender, race, birth date, geographic location, and disease diagnosis. If you need to, generate dummy data to test your code. 7.4 Hands-On Exercise Now it’s your turn to try. 7.4.1 The Code You are working on improving an alignment algorithm. Alignments are typically used for comparing samples from individuals to a reference consensus sequence. To make sure things are working, you want to compare the alignment created by an existing function (pairwiseAlignment) to an alignment created by your tool (my_new_Aligner). Note: This code is just an example and was written strictly for educational purposes. if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") if (!requireNamespace("Biostrings", quietly = TRUE)) BiocManager::install("Biostrings") library(Biostrings) reference <- DNAString("ATCGTACGTA") query <- DNAString("ATCGTAGCTA") # Perform pairwise sequence alignment alignment <- pairwiseAlignment(reference, query) # Perform sequence alignment using your algorithm my_alignment <- my_new_Aligner(reference, query) # Compare pairwise alignment to my_alignment my_evaluation <- my_alignment_evaluator(reference, query, alignment, my_alignment) 7.4.2 Questions Things are going great with the toy data! You decide to make a tiny subset of your data for testing your aligner by copying the top 100 sequences from patient 1. You then submit the aligner and data to Bard. You prompt, “is there anywhere handling this data can be made more efficient?” Why is this problematic? Can AI be used to improve annotation in the code above? Why or why not? You prefer writing loops, but have learned recently that R works more efficiently using apply() functions. Is it okay to ask ChatGPT to refactor a small loop into an apply() function? Why or why not? A colleague gives you a flash drive with some code from years ago. They swear it contains some useful snippets that will improve your aligner, but it’s in a language you haven’t worked with before. You inspect the code, and don’t believe it contains protected data. The algorithm was published many years ago in a leading journal. Is it okay to ask AI what the code is doing? 7.5 Summary Like other types of software, bioinformatics software and tools can benefit from AI assistance for writing, refactoring, annotating, and understanding code. Biological data and workflows can be complex. AI does not replace a thorough understanding of biological processes at play. NEVER submit any protected or sensitive data to AI applications. This can constitute a serious, often illegal breach of subject privacy if you are dealing with data from people. "],["languages-and-cloud-computing.html", "Chapter 8 Languages and Cloud Computing Programming Languages AI and Cloud Computing", " Chapter 8 Languages and Cloud Computing Programming Languages Here are commonly used languages and their typical use cases. Python Known for its readability, simple syntax, and comprehensive add-on modules. It is often recommended as a beginner-friendly language. Python is used for scientific computing, data analysis, machine learning, and web development. Many Data Scientist roles rely on Python. Ruby Similar to Python, Ruby is also known for its readability and user-friendly syntax. It is typically used for web development, building dynamic websites, and creating web applications. JavaScript A popular language for web development, known for its versatility and ease of use. Many of the interactive features of websites rely on JavaScript. Java Java is widely used for enterprise applications, android mobile apps, and web apps. PHP PHP is easy to learn but can become complex as the application grows. It is typically used for building dynamic web pages, web applications, and content management systems. R Used for statistical computing, data analysis, and data visualization, R is a popular choice for researchers and data scientists. Some beginners find its syntax complex, especially due to differences in the two major dialects: “base” R versus the “tidyverse”. Kotlin Kotlin is a relatively new language that is designed to be more concise and expressive than Java. However, its advanced features and concepts may be challenging for beginners. It is used for developing Android mobile apps, web development, and server-side development. Swift Used for iOS and macOS app development, Swift is known for its strict syntax and steep learning curve. C# Often used for Windows desktop applications, web applications, and video games, C# can be challenging due to its more complex syntax and strict type system. C++ A powerful language often used for developing high-performance applications and operating systems, C++ can be challenging to learn due to its complexity and the need to manage memory manually. Julia Julia is designed for numerical and scientific computing, data analysis, and machine learning. It has a syntax that is easy to learn and read, making it an ideal choice for data scientists, engineers, and researchers. Julia code can also be compiled to native machine code using the Julia compiler. Because Julia is a relatively new language, it has fewer add-on libraries and resources available compared to more established languages. MATLAB MATLAB is a high-level programming language and development environment commonly used for numerical computation, data analysis, and visualization. It is widely used in engineering, physics, and finance, among other fields. It is generally considered to have a user-friendly syntax and is relatively easy to learn compared to other languages. MATLAB has a development environment called MATLAB Compiler that allows you to package MATLAB code into standalone executables or shared libraries for use on a wide variety of operating systems. Fortran Fortran has been around for decades and has a long history in scientific and engineering communities. It has a more challenging syntax and is considered harder to learn than some of the other languages on this list. It is primarily used for scientific and engineering applications that require numerical computations and simulations such as computational physics, weather forecasting, and computational chemistry. Rust Rust is a systems programming language that focuses on speed, reliability, and memory safety. It is commonly used for developing performance-critical software, such as operating systems, web browsers, and game engines. Rust’s syntax is similar to C++, but is often considered to have a steeper learning curve due to its focus on memory safety and ownership concepts. AI and Cloud Computing AI in cloud computing is an exciting and rapidly advancing field that combines artificial intelligence (AI) and cloud computing technologies. Cloud computing provides on-demand access to computing resources over the internet, such as storage, processing, and analytics, while AI uses algorithms and models to enable machines to perform tasks traditionally requiring human intelligence. Developers can easily incorporate AI into applications using cloud-based AI services. For example, Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform offer AI services like speech recognition, image recognition, natural language processing, and machine learning. AI optimizes cloud resources and performance. It allocates resources based on workload demands, optimizes power usage, detects threats, and improves application latency by predicting user behavior and preloading content. However, integrating AI and cloud computing presents challenges around data, security, privacy, ethics, complexity, and management that must be addressed carefully. Large amounts of data are needed to train AI models, especially deep learning models requiring vast labeled data to identify patterns and predictions accurately. Storing and processing huge data can be expensive, impacting cost, scalability, and environment performance. Ensuring data security, privacy, and ethical use also require consideration. AI and cloud computing are complex with specialized skills and knowledge to use effectively. Integration increases complexity, challenging management and maintenance. Careful planning and execution evaluates benefits and limits of AI; selects appropriate models and algorithms; and determines integration into the cloud environment. Monitoring evaluates model performance, enabling adjustments to optimize performance and efficiency. As both AI and cloud computing technologies advance, the intelligent integration of the two can provide on-demand access to advanced AI capabilities, optimize resources, facilitate smarter applications, and gain data-driven business insights. Integration challenges around data, security, ethics, and complexity must be managed proactively through evaluation, planning, monitoring, and adjustments to maximize benefits and minimize limitations. When implemented responsibly, AI in cloud computing enables remarkable intelligent and scalable systems transforming organizations and society. "],["about-the-authors.html", "About the Authors", " About the Authors These credits are based on our course contributors table guidelines.     Credits Names Pedagogy Lead Content Instructor Elizabeth Humphries (Writing Code, Understanding Unfamiliar Code) Content Authors Candace Savonen (Annotating Your Code), Carrie Wright (Ethics of Using AI), Ava Hoffman (Refactoring Code) Content Editor Reviewer Howard Baek Technical Template Publishing Engineers Candace Savonen, Carrie Wright, Ava Hoffman Publishing Maintenance Engineer Candace Savonen Technical Publishing Stylists Carrie Wright, Ava Hoffman, Candace Savonen Package Developers (ottrpal) Candace Savonen, John Muschelli, Carrie Wright Art and Design Figure Artists Candace Savonen, Carrie Wright, Ava Hoffman, Elizabeth Humphries Funding Funders CS, CW, and HB were supported by the National Cancer Institute (NCI) under Grant UE5CA254170. AMH and EMH were supported by the AnVIL Project through cooperative agreement awards from the National Human Genome Research Institute with cofunding from OD/ODSS to the Broad Institute (U24HG010262) and Johns Hopkins University (U24HG010263). Funding Staff Sandra Ormbrek, Shasta Nicholson   Tools Used: ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-14 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["references.html", "References", " References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] +[["index.html", "AI for Software Development Harnessing the Power of Large Language Models About this Course Available course formats Suggested Activity Context Learning Objectives Lesson Breakdown", " AI for Software Development Harnessing the Power of Large Language Models July, 2023 About this Course This course on AI for software development explores the use of AI large language models such as ChatGPT, Bard, and others and their potential benefits and challenges. Through examples and hands-on activities, students will develop an understanding of the ways in which AI can speed up software development tasks and free up time for more creative and strategic work. By the end of the course, students will be equipped to navigate the rapidly changing landscape of software development and use AI chatbots in a way that maximizes benefits and efficiency while limiting harm as much as possible. Available course formats This course is available in multiple formats which allows you to take it in the way that best suits your needs. The material for this course can be viewed without login requirement on this Bookdown website. This format might be most appropriate for you if you rely on screen-reader technology. Our courses are open source, you can find the source material for this course on GitHub. The slides for the images in the course can be found here. Suggested Activity Context Course Audience Professionals looking to improve efficiency Students hoping to learn more about programming Anyone curious about how AI can be harnessed for technology Note: Those completely new to programming will find using AI tools to create software solutions challenging at this time. It is helpful to have some fundamental knowledge to write appropriate prompts and to discern when code is not working as expected. We advise novices to seek expert review. Course Prerequisites Some previous exposure to programming recommended. Assessment Type Hands-on activities following each lesson. Learning Objectives Learners will be able to: Explain the basics of AI and its potential for improving software development workflows Implement strategies to write and debug code using AI-based tools Describe the benefits of refactoring code using AI-powered techniques, such as making code readable, keeping it brief, and optimizing code Apply best practices for annotating code using AI Recognize strategies for using AI-based tools to understand and analyze code, such as code comprehension of unfamiliar languages or functions Discuss the challenges and ethical implications of using AI for different aspects of software development Lesson Breakdown Activity Duration Introduction 10 minutes Ethics for Using AI 20 minutes Writing Code 35 minutes Writing Exercise 15 minutes Refactoring Code 35 minutes Refactoring Exercise 15 minutes Annotating Code 25 minutes Understanding Code 25 minutes Understanding Exercise 10 minutes Total 3 hours & 10 minutes "],["introduction.html", "Chapter 1 Introduction 1.1 Learning Objectives 1.2 AI Provides New Opportunities 1.3 AI Code of Ethics 1.4 The AI Chatbots 1.5 Reproducibility With AI Chatbots 1.6 How to Use This Book 1.7 About This Book", " Chapter 1 Introduction The increasing availability of AI chatbots is creating a new landscape in software development and data science. It is important to be aware of the potential benefits and challenges of this technology, so that we can use it in a way that benefits everyone and limits harm as much as possible. On the one hand, AI chatbots have the potential to replace or significantly reduce the workload of programmers. This could lead to job losses in the data science industry. Additionally, there are significant ethical and privacy concerns regarding the use of training data sets to teach these AI systems. For example, if a chatbot is trained on a dataset of biased or discriminatory data, it could perpetuate those biases in its interactions with users. On the other hand, AI chatbots also have the potential to create new jobs and opportunities. For example, chatbot developers will be needed to create and maintain these AI systems. Additionally, AI chatbots could be used to automate tasks that are currently done by humans, freeing up programmers to focus on more creative and strategic work of interpreting what the data actually mean. Ultimately, while the future of programming in software development and data science is unclear, there is much to be optimistic about in this brave new world. The advancements in AI and chatbot technology present an opportunity for programmers to work alongside algorithms to create innovative solutions that were previously unimaginable. As with any technological revolution, there will be challenges, but it is up to us to work together to ensure that we use these tools ethically and responsibly, and to create a future that benefits everyone. 1.1 Learning Objectives In this chapter we will demonstrate how to: Consider the potential benefits and limitations of AI for software development Recognize how to use this course, how it was written, and what to expect Explain general concepts for ethical use of AI tools for software development Summarize what Chatbots are and how they work Distinguish between different AI options for software development 1.2 AI Provides New Opportunities We do not believe that AI chatbots will replace programmers, software developers, and data scientists. While AI can be powerful and useful, it is only as good as the training data behind it and the human using it. Therefore, for the time being and the foreseeable future, humans will remain at the forefront of programming innovation, with AI providing valuable support. As new code is often built on top of existing code written by humans, it is unlikely that AI will fully replace the need for human programmers. However, we do think that the roles of programmers in software development and data science are changing, and some of the skills required to work in this new environment are slightly different from those needed before. In addition to proficiency in programming syntax, programmers may need to develop a more refined ability to construct queries that enable AI bots to create the desired workflows. Other critical skills such as debugging, troubleshooting, and documentation will continue to be valuable, and can be enhanced through the use of AI. Furthermore, AI creates new opportunities and niches for tasks that were previously difficult, time-consuming, or impossible. This is an exciting time for those working in programming, with great potential to explore new and innovative solutions. As AI technology continues to advance, its role in programming will only become more significant, and it is up to developers to adapt and learn how to use these new tools effectively. We asked OpenAI’s ChatGPT whether AI chatbots would replace human software developers, data scientists, and programmers, and this was the response: “It is unlikely that AI chatbots will replace software developers, data scientists, and programmers entirely. While AI chatbots can perform certain tasks related to software development, data analysis, and programming, they are limited in their abilities and require human oversight and input to function properly. AI chatbots can be used to automate certain tasks such as generating simple code snippets, running tests, and performing simple data analysis. They can also assist in the development process by providing quick answers to commonly asked questions and suggesting possible solutions to programming problems. However, more complex software development, data analysis, and programming tasks require a level of creativity, critical thinking, and problem-solving that AI chatbots cannot currently replicate. Software developers, data scientists, and programmers have a deep understanding of the technologies and tools used to develop AI chatbots, as well as the ability to optimize and improve those tools. Furthermore, software development, data analysis, and programming require a high level of human communication and collaboration, which AI chatbots cannot currently replicate. Humans are also essential in making ethical decisions related to AI development, such as ensuring that AI systems are transparent, fair, and unbiased. In summary, while AI chatbots can provide valuable assistance to software developers, data scientists, and programmers, they are unlikely to replace them entirely. Human expertise, creativity, and critical thinking will continue to be essential in developing and optimizing AI systems for the foreseeable future.” 1.3 AI Code of Ethics Our first duty as programmers is to avoid causing harm to people and the world. This includes avoiding both intentional and inadvertent harm. Intentional harm is harm that is caused by a programmer’s deliberate actions. For example, a programmer might create a virus that is designed to harm computers. Inadvertent harm is harm that is caused by a programmer’s negligence or lack of foresight. For example, a programmer might create a program that is buggy and causes data loss, or create a program that is particularly vulnerable to being hacked. We must always be aware of the potential for harm and deliberately take steps to mitigate it. This includes things like: Be aware of the potential for bias. AI systems are trained on data, and that data can reflect the biases of the people who collected it. Programmers should be aware of the potential for bias in their data and take steps to mitigate it. Be transparent about your work. Share information about your code, data, and algorithms. This will help others to understand how your work works and to identify any potential problems. Be accountable for your work. Take responsibility for the consequences of your code. Be willing to make changes if necessary. Participate in discussions about AI ethics. There are many online and in-person communities where you can discuss the ethical implications of AI with other people. Advocate for ethical AI. Speak up when you see AI being used in a way that you believe is unethical. Support organizations that are working to promote ethical AI. 1.4 The AI Chatbots Humans have been interacting with AI chatbots for years. In fact, Alan Turing is credited with coming up with the concept for chatbots as early as 1950. Chatbots are software-based systems that interact with humans typically by text or speech inputs, rather than code. They mimic some human activity (“Chatbot” 2023; Abdulla et al. 2022) based on these language inputs. They process the inputs using natural language processing commonly abbreviated as NLP. NLP is a kind of AI that uses human text or speech and parses the language to determine structures and patterns to extract meaning. NLP uses large amounts of language data (such as books, websites etc.) to train AI systems to identify these structures and patterns. For example, the AI model might identify when a sentence is a question or a statement by examining various features in a prompt such as the inclusion of a question mark of the use of words often used in questions (“Natural Language Processing” 2023; Cahn 2017). The methods used for chatbots have evolved over time. Now chatbots often utilize AI methods like deep learning (which involve multiple layers of abstractions of the input data (“Deep Learning” 2023)) to extract meaning from the language data (“Natural Language Processing” 2023). As these methods use large quantities of text, they are therefore often called large language models (“Large Language Model” 2023). Despite the fact that chatbots have been around awhile, the popularity of OpenAI’s ChatGPT and DALL-E programs has sparked a recent surge of interest. These chatbots are in part particularly powerful due to the fact that large amounts of computing power were used to train their NLP models on very large datasets (Caldarini, Jaf, and McGarry 2022; Cahn 2017). Large language model AIs can be divided into two categories: those that can be reached using an internet browser, and those that can be reached using an integrated development environment (IDE). The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. This is not a comprehensive list! Browser-based chatbots Bard is Google’s large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive. It is trained on a massive amount of text data and is able to communicate and generate human-like text in response to a wide range of prompts and questions. You will need to join the waiting list before getting access, although in our experience the wait time is short. ChatGPT is OpenAI’s large language model based on the GPT (Generative Pre-trained Transformer) architecture. It is designed to understand and generate human-like responses to text-based conversations. You will need to create an account before using it. Phind is a AI-powered search engine designed specifically for developers asking technical questions. In addition to answering your coding question, it will provide links to the internet sources used to find the answer. Poe is Quora’s platform that gives you access to multiple AI chatbots, including Sage, based on OpenAI’s GPT-3.5 architecture; Claude, an AI natural language bot created by Anthropic; and ChatGPT. IDE-based chatbots CoPilot by GitHub is a language bot that interfaces with your editor. This bot provides coding suggestions and ways to speed up testing. GitHub describes it as an “AI pair programmer”. It uses a subscription model with both individual and business tiers. CodeWhisperer is the Amazon Web Services chatbot for the AWS cloud platform. It can be installed in your integrated development environment (IDE) using the latest AWS Toolkit plugin. The individual tier for CodeWhisperer is free, although you will need to create an AWS account and add a payment method. What are the data sources for these chatbots? Some high-profile English-language AIs, including Google’s T5 and Facebook’s LLaMA, were trained on more than 15 million websites (Schaul, Chen, and Tiku 2023). The top sources of training data for these AI models were https://patents.google.com/ and https://www.wikipedia.org/ ! 1.5 Reproducibility With AI Chatbots Reproducibility in programming is the ability to reproduce the results of an analysis using the same data and methods. It’s an important aspect of working with AI tools as well, particularly when using AI to debug or fix code. In order to maximize the reproducibility of your work, you should create good habits when working with any AI tool. This includes documenting your prompts and the intentions behind them, as well as saving the response threads generated by an AI chatbot. Some of the current chatbots already save your threads (particularly ChatGPT and Phind). At this time, Bard and the chatbots accessible via Poe do not save queries as individual threads but do keep a record of the questions and responses for some time. However, it is best practice to save them yourself as the saved chat threads can be reset. Saving them yourself also allows you to write notes about your intentions. Keep in mind that the answers from the chatbots can change over time - don’t expect them to always be the same! Phrasing a query slightly differently can result in a different answer, as can asking in a new session. You should expect to ask the AI tool the same question in multiple ways in order to be confident in the answer. 1.6 How to Use This Book This book is written with the expectation that users will follow along and paste the queries from each example into their own AI chatbot session. After you have finished this book, you will be able to use whichever combination of chatbot and programming environment that you’re comfortable using. 1.6.1 Choice of AI Model You can use whichever AI you like for this course, as long as it is a large language model. (Other types of AIs are not explored in this course.) We will use a variety of browser-based chatbots as examples throughout this course. Some AIs will be better suited for your particular tasks, and it is too early to judge which of the technologies will end up dominating the industry. We encourage you to test multiple AI platforms to find the ones that best fit your needs, and the use of a particular AI in each chapter should not be taken as an endorsement by us. We have focused on the browser-based chatbots for our examples in this course for ease of demonstration. 1.6.2 Computing Environment Needs All the examples and activities in this course can be done in the computer environment you prefer. None of the tasks require a huge amount of computing power, so local installations on your personal computer are fine, though everything will also translate to a high performance computing cluster or a cloud computing platform. In addition to the CodeWhisperer chatbot available on the AWS cloud platform, a variety of cloud-based AI tools and platforms exist that can be leveraged for practical projects after this course. For instance, Amazon SageMaker or Google Cloud AI Platform can be used for constructing and training machine learning models, while Microsoft Azure Cognitive Services can be applied to incorporate natural language processing features into your applications. If you’d like to learn more about how AI can be integrated with cloud computing, you can read about it in the Additional Learning section. 1.6.3 Boxes Throughout this course, you will see several types of specially-colored boxes. In particular, there are two to which you should pay special attention. One box highlights potential ethical issues with an AI usage. In this box, you’ll find reminders of the ethical considerations you should be particularly aware of while using AI to do certain tasks. The other box highlights our queries to AI chatbots. The question we posed to the AI chatbot is written in these boxes to make it easier for you to find. Wherever possible, we use screenshots of the actual answers we got from the AI chatbots in response to our questions. Occasionally when we have asked for the AI to write or fix code, we have formatted the answer in such a way that makes it easier to use. When you see this format, you will know it indicates an AI response Code in these boxes was written by AI. 1.7 About This Book We used AI bots (including ChatGPT, Bard, Sage, and Claude) to develop and write much of this course. We firmly believe this new technology can be used in ethical ways and we put into practice all the rules and suggestions offered over the following modules while we created this book. "],["ethics-of-using-ai.html", "Chapter 2 Ethics of Using AI 2.1 Learning Objectives 2.2 Major concerns 2.3 Replacing Humans 2.4 Bias 2.5 Misinformation 2.6 Unintelligible Code 2.7 Faulty or Buggy Code 2.8 Security and Privacy issues 2.9 Violating Copyright 2.10 Harmful code 2.11 Summary", " Chapter 2 Ethics of Using AI The use of artificial intelligence (AI) and in particular, generative AI, in coding has raised a number of ethical concerns. We will highlight several current concerns, however please be aware that this is a dynamic field and the possible implications of this technology is continuing to develop. It is critical that we as a society continue to evaluate and predict what the consequences of the use of AI will be, so that we can mitigate harmful effects. 2.1 Learning Objectives In this chapter we will demonstrate how to: Describe key ethical concerns for using AI tools in software development Identify possible mitigation strategies for these major concerns Explain the potential benefits of being transparent about the use of AI tools to write code Discuss why human contributions are still important and necessary Recognize strategies for using AI tools to write code more responsibly Implement prompts to ask AI tools about responsible use throughout various coding practices 2.2 Major concerns In this chapter we will discuss the following issues that writing code with AI tools may contribute to: Replacing Humans - AI tools can help humans write code, but they are not a replacement. Bias - AI models are built on data and code that were created by biased humans, this bias can be further perpetuated. Misinformation - fake or manipulated data used to help design algorithms could be believed to be correct and this could be further propagated. Unintelligible Code - Developers need to follow best practices with code generated by AI. Just like code generated by human developers, this includes figuring out how the code works, how it integrates with other code, as well as annotating and documenting how to use the code . Faulty or Buggy Code - Code that may not be optimal for a given situation may be inadvertently used by those with less familiarity, which may result in faulty software. Security or Privacy Issues - Code may not be optimized for security if not adequately evaluated. Generative AI models have access to large amounts of data that is currently unregulated and may include data that should be protected for privacy reasons. Copyright Violations - The code used for the generative AI model may used code that has copyright laws that require attribution or do not allow reuse and it may not be clear where the code came from. Harmful Code - Currently it is not clear how well generative AI models restrict the creation of code that will be used for goals that harm others. Note that this is an incomplete list; additional ethical concerns will become apparent as we continue to use these new technologies. We highly suggest that users of these tools be transparent about the use of these tools, so that as new ethical issues emerge, we will be better prepared to understand the implications for specific coding projects. Be transparent about what AI tools you use to write code. This help others to better understand how you created your code, as well as the possible sources that the AI tools might have used when helping you write code. It may also help with future unknown issues related to the use of these tools. Some organizations and scientific societies have created guidelines or requirements for using AI in journal articles and conference submissions, like the International Society for Computational Biology. Be aware of the requirements/guidelines for your field. It is essential to address these ethical concerns and ensure that the use of AI in coding is done in a responsible and transparent manner. This could be done through ensuring the quality of the data used to train AI systems, promoting transparency in AI-generated code, and implementing safeguards against the creation of harmful or biased code. By doing so, we can harness the potential of AI to improve and transform the way we write and optimize code while maintaining ethical standards. 2.3 Replacing Humans Those who use AI tools to write code or to understand code need to recognize their own value in the process. While AI systems are useful, they do not replace the strengths that humans have for innovating new ways to write code, evaluating how the code integrates into the larger picture of a project, or in evaluating the downstream consequences of the code. Computer science is a field that has historically lacked diversity and new learners can often feel intimidated. There is the potential that new learners may feel even more discouraged as they learn to write code when witnessing AI tools write code. It is critical that we support diverse new learners of computer science, as we will continue to need human involvement in the development and use of AI tools. 2.3.1 Tips for supporting human code Follow best teaching practices for inclusion of diverse learners. Encourage current and future programmers to continue to innovate and write code. Avoid thinking that code written by computers must be better than code written by humans, as this simply is not true. Recall that humans first wrote they code that the AI tools use to write code. Be transparent about when you do or do not use AI tools, give credit to the humans who write code where possible. A new term in the medical field called AI paternalism describes the concept that doctors (and others) may trust AI over their own judgment or the experiences of the patients they treat. This has already been shown to be a problem with earlier AI systems intended to help distinguish patient groups. Not all humans will necessarily fit the expectations of the AI model if it is not very good at predicting edge cases (Hamzelou n.d.). Therefore, in all fields it is important for us to not forget our value as humans in our understanding of the world. 2.4 Bias One of the biggest concerns is the potential for AI to create biased code. AI systems are trained on data created by humans. If this data used to train the system is biased (and this includes existing code that may be written in a biased manner), the resulting code could also be biased. This could lead to discrimination, abuse, or neglect for certain groups of people, such as those with certain ethnic or cultural backgrounds, genders, ages, sexuality, capabilities, religions or other group affiliations. It is well known that data and code are often biased (Belenguer 2022). It is important that evaluations be made throughout the software development process to check for and consider potential perpetuation of bias. 2.4.1 Tips for avoiding bias Be aware of the potential biases in the data that is used to train AI systems. Check for possible biases within the data. Are there harmful data values? Examples could include discriminatory and false associations. Are the data adequately inclusive? Examples could include a lack of data about certain ethnic or gender groups or disabled individuals, which could result in code that does not adequately consider these groups, ignores them all together, or makes false associations. Are the data of high enough quality? Examples could include data that is false about certain individuals. Evaluate the code for biases as it is developed. Check if any of the criteria for weighting certain data values over others are rooted in bias. Consider the possible outcomes of the use of the code. Consider if the code could possibly be used in a manner that will result in discrimination. See Belenguer (2022) for more guidance. We also encourage you to check out the following video for a classic example of bias in AI: 2.5 Misinformation AI tools use data that may contain false information. AI tools may also report data as if it is real, when it is in fact not real. For example, currently at the time of the writing of this course, ChatGPT will report citations with links that are not correct. 2.5.1 Tips for reducing misinformation Be aware that some AI tools currently make up false information based on artifacts of the algorithm. Do not assume that everything that the algorithm produces is real or correct. 2.6 Unintelligible Code There is risk that those less aware of best coding practices use AI-generated code and do not follow these practices. This could make it difficult for others to understand how the code works and could make it hard to identify and fix any issues that may arise. This could result in negative consequences, such as system crashes or security breaches, that could have been avoided if the code had been written by an experienced and savvy human programmer. 2.6.1 Tips for avoiding unintelligible code Code should be reviewed by experienced programmers. Code should be annotated throughout to explain what the code is doing. Documentation should be created that describes how to use the code properly. Ask AI tools to help you understand the code, but get expert assistance where needed. Can you explain what the code in line 4 is doing? 2.7 Faulty or Buggy Code There is a risk that code written with AI tools, if not reviewed carefully enough, may be faulty. AI systems may not be trained on data that relates to current software dependencies or best practices. It is also important to remember that code generated by AI tools is not necessarily better than code written by humans. It requires just as much code review. 2.7.1 Tips for reducing faulty or buggy code Do not assume that the code generate by AI is correct. Realize that AI is only as good or up-to-date as what it was trained on, the code may be generated using out-of-date code. Look up packages and functions used to ensure if the code is up-to-date. Make sure that you understand the code that you are using. AI can be used to help you understand what the code is doing, but consult with experts when needed. Ask the AI tools for extra information about if there are any potential limitations or weaknesses in the code, but keep in mind that the tool may not be aware of issues and therefore human review is required. The information provided by the tool can however be a helpful starting point. Are there any limitations associated with this code? What assumptions were made in writing this code? Stack Overflow, a popular community-based website where programmers help one another, has (at the time of writing this) temporarily banned users from answering questions with AI-generated code. This is because users were posting incorrect answers to questions. It is important to follow this policy (as you may face removal from the community). This policy goes to show that you really need to check the code that you get from AI models. While they are currently helpful tools, they do not know everything. 2.8 Security and Privacy issues AI tools are not always trained in a way that is particularly conscious of data security. Furthermore, if the code is written using these tools by users who are less familiar with coding security concerns, protected data or important passwords may be leaked within the code itself. AI systems may also utilize data that was actually intended to be private. It is also important to consider what data your code might actually be using. 2.8.1 Tips for reducing security and privacy issues Check that all passwords, access tokens (like API keys), security certificates are not saved in a public place where anyone can access or tamper with them. Check that no sensitive data, such as Personal Identifiable Information (PII) becomes public through the code. Utilize encryption and other security practices where needed. Consult with an expert about data security if you think your code could possibly cause someone to access protected data who is not authorized to access it. Promote for regulation of AI tools by voting for such standards where possible. Ask AI tools for help, but to not rely on them alone. Are there any security or privacy concerns associated with this code? 2.9 Violating Copyright When AI systems are trained on data, they may also learn and incorporate code from that data. This means that AI-generated code could potentially infringe on the copyright of the original author of the code. For example, if an AI system is trained on a GitHub repository that contains code written by a human programmer, the AI system could generate code that is identical to or similar to the code in the GitHub repository. If the AI system then uses this code without permission from the original author, this could constitute copyright infringement. In general, we want programmers to feel comfortable sharing their code openly without fear they won’t be credited. Similarly, AI systems could potentially infringe on intellectual property rights by using code that is protected by trademarks or patents. For example, if an AI system is trained on a training manual that contains code that is protected by a trademark, the AI system could generate code that is identical to or similar to the code in the training manual. If the AI system then uses this code without permission from the trademark owner, this could constitute trademark infringement. 2.9.1 Tips for avoiding copyright violations Be transparent about what AI tools you use to write your code. Obtain permission from the copyright holders of any code that you use to train the AI system. Only use code that is in the public domain or that has been licensed for use by the AI system’s owner. Cite any GitHub repositories or training manuals you might use in your code. Ask the AI tools if the code it helped generate used any code that you can cite. Did this code use any code from others that I can cite? 2.10 Harmful code Another major concern is the use of AI to generate malicious code. For instance, AI could be used to create code that spreads malware or hacks into computer systems. This could cause severe damage to individuals and organizations, including data breaches and financial losses. 2.10.1 Tips for avoiding the creation of harmful code Be careful about what code you share publicly, as it could be used for malicious purposes. Consider how the code might be used by others. Ask the AI tools to help you, but do not rely on them alone. What are the possible downstream uses of this code? What are some possible negative consequences of using this code? 2.11 Summary Here is a summary of all the tips we suggested: Disclose when you use AI tools to write code. Credit human authors by citing their code and adhering to copyright restrictions. Ensure the code doesn’t expose sensitive data, such as passwords or PII. Don’t assume AI-generated content is real, accurate, or current. Ask the AI tools to help you understand: Sources for the code that you can cite What the code is doing Code limitations Potential security or privacy issues Potential downstream consequences of the code Always have expert humans review the code and value your own contributions and thoughts. Overall, we hope that these guidelines and tips will help us all to use AI tools more responsibly for writing code. We recognize however, that as this is emerging technology and more ethical issues will emerge as we continue to use these tools in new ways. AI tools can even help us to use them more responsibly when we ask the right additional questions when writing code, but remember that human review is always necessary. Staying up-to-date on the current ethical considerations will also help us all continue to use AI responsibly. "],["writing-code.html", "Chapter 3 Writing Code 3.1 Learning Objectives 3.2 Coding Basics 3.3 Writing Code with AI 3.4 Tips for Coding with AI 3.5 Example One: Creating a Function 3.6 Example Two: Debugging Code 3.7 Example Three: Comparing Syntax 3.8 Example Four: Figuring Out a Common Task 3.9 Example Five: Planning Your Program 3.10 Example Six: Selecting a Coding Language 3.11 Hands On Exercise 3.12 Summary", " Chapter 3 Writing Code 3.1 Learning Objectives Recognize the benefits and capabilities of AI-powered programming tools, particularly for certain kinds of tasks that are difficult for humans Explain how AI-generated code and automated programming tools contribute to making programming more accessible to individuals with limited coding experience or to those working in a new language Explore the ethical considerations associated with using AI in programming, including the importance of rigorous testing before publishing code Apply strategies for using AI to build functions, debug code, understand error messages, check syntax, choose the proper programming language, and plan out a multi-step programming project 3.2 Coding Basics In today’s digitally-driven world, coding has become an essential skill. It is used to create websites, mobile applications, and sophisticated software systems. Contrary to popular belief, coding is not only for those who possess an innate talent for programming. With the right resources and a willingness to learn, anyone can become proficient in coding. Practice is crucial to mastering the art of coding, and those who are willing to commit themselves to the learning process can achieve great results. Writing code can also be a rewarding and fulfilling hobby, too. Creating your own software, building a website, or automating a task can be immensely satisfying, and can provide a sense of accomplishment and pride. Moreover, writing code can be a form of creative expression. Just as musicians use notes to compose music and artists use paint to create visual art, programmers use code to create digital experiences. From building websites to creating video games, code can be used to bring ideas to life in exciting and innovative ways. Ultimately, the ability to write code is a valuable asset in today’s world, with increasing demand for skilled programmers in a wide range of industries. Whether as a career or hobby, coding can be both personally and professionally rewarding. The accessibility and versatility of coding make it an excellent skill for anyone to learn, regardless of their background or previous experience. 3.3 Writing Code with AI AI-powered programming tools and technologies are revolutionizing the way we approach software development. AI can generate code snippets. An AI-powered code editor can suggest more efficient algorithms or highlight potential errors in real-time. It can also help developers to reproduce bugs more easily and to test potential fixes more quickly. This can save developers a significant amount of time and effort, and it can help them to debug their code more effectively. Be aware that AI still makes a lot of mistakes when writing code, so you will need to be particularly careful in testing code it writes. AI can make programming more accessible. With AI-generated code and automated programming tools, individuals who are not experienced developers can still create software solutions. Experienced developers can also more easily write code in languages with which they aren’t familiar. This has the potential to democratize programming and lead to new innovations from a wider range of people. Can complete novices learn how to program using AI tools? At present, it is very challenging for complete novices to rely on AI chatbots to learn how to write code. These tools will sometimes write buggy code or code that doesn’t not work as intended, and a person with no programming experience will have difficulty learning how to spot these sorts of mistakes. AI chatbots are best thought of as a supplement to your software development education, not as a replacement, and they should be used alongside other resources. Therefore, we advise that you seek expert review and assistance. As AI tools are refined, complete novices may be able to create software products more readily on their own. AI is changing the skills required for writing code. Rather than simply memorizing syntax, developers need to focus on developing their creativity, intuition, and problem-solving skills. Additionally, programmers must also hone their abilities in architecture design and project management, as these skills are becoming increasingly important in the modern software development landscape. As AI continues to evolve, it is likely to have an even greater impact on the way we develop software in the future. As a programmer using AI, it is up to you to be responsible for what you create. You must rigorously test any code you write. It is your job to make sure any code you create with AI is not malicious and works as expected. 3.4 Tips for Coding with AI Using AI to write code is as much of an art as writing your own code. Following a few rules and suggestions will make your experience much more enjoyable and productive. Remember that AI cannot read your mind. You need to tell it what you’re thinking, and it may take several rounds of question-and-answer before you get something you can work with. Be clear about your goals. What do you want the AI to do? Be as specific as possible. Start with a simple task. Don’t try to do too much too soon. Start with a simple task and then gradually work your way up to more complex tasks. You wouldn’t write a complete program all at once, and you shouldn’t have AI do it, either. Provide enough data. The more data (and detail) you can give the AI, the better it will be able to learn and perform the task you want it to do. Be patient. Querying an AI can take time. Don’t expect it to be perfect right away. Test everything AI writes. You are responsible for the code that AI generates, even if you didn’t write it yourself. AI can be biased, just like humans, and this could lead to sub optimal development. There is a concern that AI may focus development on areas that optimize certain metrics but not overall quality, effectiveness, or user needs. Human judgment is still needed to ensure that AI is used in a responsible and ethical way. Test your results. Make sure the AI is actually doing what you want it to do. AI systems are trained on data, and if the data is not representative of the real world, the AI system may not be able to generate code that works correctly. It is important to test the code to make sure that it is working as intended and that it is producing the desired results. Inadequate testing could lead to low quality, buggy code. Since AI generated code would be highly automated, it requires strong testing procedures and safety oversight to ensure it meets standards before deployment. Make sure your code is secure. AI systems can be used to generate code that is vulnerable to attack, and it is important to test the code to make sure that it is secure. This can help to prevent security breaches from occurring. Use a variety of AI models. There are many different AI models available. Experiment with different models to see which one works best for your task. Get feedback from others. Ask other people to review your code and give you feedback. This can help you identify and fix any problems with your code. 3.5 Example One: Creating a Function One of the most common things a programmer does is write functions to automate tedious tasks. With proper guidance, AI can speed this process up. Perhaps you need to write a quick function that reverses all the letters in a string. We can query Phind for help. How do I write a function that reverses the letters in a string? To reverse a string in Bash, there are multiple approaches. One way to reverse a string is by using the rev command. According to StackOverflow, the rev command is installed on most Linux systems by default. To use the rev command, simply enter rev followed by the string you want to reverse. For example: $ echo "12345" | rev However, it is important to note that the rev command is not present in all flavors of Unix. Another way to reverse a string is by using a simple Bash script. According to StackOverflow, we can write a Bash script that reverses a string using a for loop. Here is an example script: var="12345" copy=\\${var} len=${#copy} for((i=$len-1;i>=0;i--)); do rev="$rev\\${copy:$i:1}"; done echo "var: $var, rev: $rev" This script takes the string “12345” and stores it in the variable var. It then creates a copy of the string and stores it in the variable copy. The length of the string is stored in the variable len. The for loop starts at the end of the string and iterates through each character, adding it to the variable rev. Finally, the script prints the original string and the reversed string. Another approach to reversing a string is by using the tac command. According to StackOverflow, we can use the tac command to reverse the contents of a file or the standard input. Here is an example command: $ echo "12345" | sed 's/./&\\n/g' | tac | sed -e :a -e 'N;s/\\n//g;ta' This command uses the sed command to insert a newline character after each character in the string. It then passes the output to the tac command, which reverses the order of the lines. Finally, another sed command is used to remove the newline characters and print the reversed string. Overall, there are multiple approaches to reversing a string in Bash. The rev command, a simple Bash script with a for loop, and the tac command are all viable options. Phind offers you a variety of ways to do this, but all these options are in the language Bash. Because we didn’t include a programming language in our query, Phind picked one for us. Let’s refine our query a little. How do I write a function using Python that reverses the letters in a string? To write a function using Python that reverses the letters in a string, there are several approaches that can be taken. One approach involves using a for loop and the concatenation operator to join the original string in reverse order. This technique is demonstrated in realpython.com as follows: def reverse_string(text): result = "" for char in text: result = char + result return result print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and initializes an empty string result. The for loop iterates over each character in the input string, concatenating it with the current content of result and reassigning the new intermediate string to result at each iteration. At the end of the loop, result holds a new string as a reversed copy of the original one. Another approach involves using slicing to reverse the string. This technique is demonstrated in stackoverflow.com as follows: def reverse_string(text): return text[::-1] print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and returns the string in reverse order using slicing. The syntax text [::-1] means to slice the entire string with a step of -1, which reverses the order of the string. A third approach involves using Python’s built-in reversed() function to reverse the string. This technique is demonstrated in datagy.io as follows: def reverse_string(text): return ''.join(reversed(text)) print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and uses the reversed() function to return an iterator that produces the characters of the string in reverse order. The join() method is then used to convert the iterator back into a string. All three approaches produce the same output, but they differ in terms of performance and readability. The first approach using a for loop and concatenation is the most straightforward, but it can be slower for large strings because it creates successive intermediate strings only to throw them away in the next iteration. The second approach using slicing is more efficient because it creates a new string object only once, but it may be less readable for some programmers. The third approach using reversed() is concise and readable, but it may be slower for large strings because it creates a reversed iterator that must be converted back into a string using join(). Once again, Phind has come through with multiple options. Now let’s say you knew you wanted to write a function like this, but you realize it’s only the strings in the third column of your file. Let’s add that requirement to our query. “How do I write a function using python that reverses the letters in a string for the third column of my file?” To write a function using Python that reverses the letters in a string for the third column of a file, you can use the extended slice syntax in Python. The extended slice syntax is a more concise and faster way of slicing strings in Python. To reverse a string using extended slices, you can use the following syntax: string[::-1] In this syntax, the first colon indicates the start of the slice, the second colon indicates the end of the slice, and the third colon indicates the step size of the slice. By setting the step size to -1, you can reverse the string. To create a function that reverses the letters in a string for the third column of a file, you can use the following code: def reverse_string(string): return string[:2] + string[2:5][::-1] + string[5:] In this code, the reverse_string() function takes a string as input and returns the string with its third column reversed. The first two characters of the string are returned as-is, while the third column (characters 3 to 5) are reversed using extended slices. Finally, the remaining characters of the string are returned as-is. There are other ways to reverse a string in Python, including using loops, recursion, stacks, and function calls. However, using extended slices is the most concise and efficient way to reverse a string in Python. Here are some pros and cons of using extended slices to reverse a string: Pros: Concise syntax: using the extended slice syntax is a one-liner that is easy to read and write. Fast execution: using the extended slice syntax is faster than using loops or other methods to reverse a string. Cons: Limited functionality: the extended slice syntax can only be used to reverse a string, and cannot be used for more complex string operations. Not beginner-friendly: the extended slice syntax is not intuitive for beginners who are not familiar with Python’s slicing syntax. In conclusion, to write a function using Python that reverses the letters in a string for the third column of a file, you can use the extended slice syntax in Python. This syntax is concise and efficient, but may not be beginner-friendly and may not be suitable for more complex string operations. 3.6 Example Two: Debugging Code A vast amount of time in programming is spent on debugging code. Often the compiler error messages are very precise about what has gone wrong while simultaneously lacking enough detail for the programmer to immediately understand how to fix the error. It is good practice to debug code as you go, instead of writing the entire program and then testing to identify the errors. Using an AI chatbot makes this task much, much easier. There are two options for using ChatGPT for debugging. First, we could query what the error code means. Here’s a Python code snippet that is throwing a TypeError: unsupported operand type(s) for +: 'int' and 'str' error when we try running it. def add_numbers(a, b): result = a + b return result number1 = 10 number2 = "20" sum = add_numbers(number1, number2) print("The sum of", number1, "and", number2, "is", sum) What does this error mean? TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’ This error typically occurs when you try to concatenate (i.e., add together) a string and an integer using the “+” operator in Python. In Python, the “+” operator is used for both addition and concatenation depending on the data types of the operands. If you try to add a string and an integer using the “+” operator, Python will raise a TypeError because these two data types are not compatible for this operation. For example, if you have the following code: age = 25 message = "I am " + age + " years old." Python will raise the TypeError because you are trying to concatenate a string (“I am”) with an integer (25) using the “+” operator. To fix this error, you can convert the integer to a string using the str() function: age = 25 message = "I am " + str(age) + " years old." In this case, the str() function converts the integer age to a string before concatenating it with the other string, so the operation is valid and no error will be raised. ChatGPT helpfully explains what TypeError means, what causes it, and how to fix it. This is quite useful when you don’t understand the error message. Another option is to submit the code snippet that is causing an error to ChatGPT directly and ask for it to fix the mistake. Here’s another buggy code snippet, this time in C#. Can you debug this code? using System; class Program { static void Main() { int[] numbers = { 1, 2, 3, 4, 5 }; int sum = 0; for (int i = 1; i <= numbers.Length; i++) { sum += numbers[i]; } Console.WriteLine("The sum of the numbers is: " + sum); } } Sure, I can help you debug this code. Here are the issues I see: The loop is going out of bounds: The loop is iterating from 1 to numbers.Length, but array indexes start from 0. Therefore, when i is equal to numbers.Length, numbers[i] is trying to access an element outside the bounds of the array, which will result in an IndexOutOfRangeException. To fix this, you should change the loop condition to i < numbers.Length. The loop is not summing the array elements correctly: The loop is adding the elements starting from the second element instead of the first. To fix this, you should change the initial value of i to 0, which corresponds to the first element of the array. Here’s the corrected code: using System; class Program { static void Main() { int[] numbers = { 1, 2, 3, 4, 5 }; int sum = 0; for (int i = 0; i < numbers.Length; i++) { sum += numbers[i]; } Console.WriteLine("The sum of the numbers is: " + sum); } } This should correctly sum the elements of the numbers array and output the result. Be aware that if you choose to submit your code directly to an AI chatbot to debug that you submit a snippet at a time (if possible). Otherwise the answer can get quite unwieldy! 3.7 Example Three: Comparing Syntax You may be in the middle of writing code when you realize that you’ve used two possible syntaxes for the same command. Perhaps you are using the R package here and are wondering if two different ways of writing a command are the same. Is here::here(\"raw\", \"data.csv\") equivalent to here::here(\"raw/data.csv\")? First, let’s see what ChatGPT responded. When we first started putting together this course, we had intended to include an answer from Bard that disagreed with the answer from ChatGPT as an example of why it can be helpful to query multiple chatbots. (We decided ChatGPT was correct after verifying its answer in the here package documentation.) However, when we queried Bard again two weeks later, we were pleasantly surprised to discover that Bard had corrected its answer. Now it’s a cool little example of how the chatbots are rapidly improving and “learning” from their mistakes. 3.8 Example Four: Figuring Out a Common Task Sometimes you have a task that needs done that is fairly routine. You might not have done it before, but plenty of people in the world have. In this case, you don’t want to reinvent the wheel, but you may not know exactly where to start. AI is a great option for figuring out the basic workflow steps. A very common task for anyone working with genomic samples is to combine multiple files in the vcf format. Perhaps you’ve been asked to do this and then create a separate file that contains the names of the samples. You turn to ChatGPT for help. Can you write a program that merges vcf files and then outputs the sample names? Great! Of course, maybe you’re not comfortable using bcftools, or you try this and can’t get it to work. You can take advantage of the “Regenerate Response” option in ChatGPT to get another solution. What’s another way I can do this? You can also specify some of the features you’d like in your code, or where you might want the code to come from. Maybe you want to practice using GitHub repositories, so you specifically request it. Can you write a program that merges vcf files and then outputs the sample names using a github repo? If you know you’ve used something like a GitHub repository, you can ask ChatGPT for the citation. Just be aware that you need to verify that any citation it gives you should be checked independently. Unfortunately, AI bots can still respond with imaginary journal articles. (Assume that if you can’t find it, it doesn’t exist.) What’s the citation for the repo used in this code? In general, after you find a solution that works for you, you should always make a habit of asking the AI where it got the code. Never, ever, ever run code without knowing where it came from. What’s the source for this code? You should always take care that you are only using code that you’re authorized to use when writing code. This means asking the AI chatbot where code snippets are from and tracking down those citations and authorizations yourself (instead of just trusting the AI that everything is okay). 3.9 Example Five: Planning Your Program Let’s say we want to create software that plays the card game blackjack. You can jump right in and ask your AI chatbot to write the code, but you’re unlikely to be successful and incredibly likely to get frustrated. Before you start interacting with the AI, you need to plan out what you want to do. Everyone needs a plan of attack when building software. You should start by identifying what your software will do, then you can figure out the steps necessary to make that happen. We’ll ask Claude-instant (as implemented by Poe) how we can go about creating the necessary code for our game. How would I go about writing a program that can play blackjack? That seems like a really good roadmap for building a basic game, but if you’re ambitious (and of course you are), you might want to give your game a little extra something. Claude-instant has some ideas for that, too. What are some other features I could add to this game? At this point, you are ready to begin writing your program. You have a lot of tasks to do, so you have to pick a logical starting point. If you’re uncertain, you can ask AI for advice on your first task. What should be my first coding task for this program? In this case, the AI went beyond just suggesting a first task and also offered several options for how to do it as well. 3.10 Example Six: Selecting a Coding Language You will need to choose the language for your software. If you don’t, AI will likely pick a language for you, which may or may not be one you want. You might choose a language based on: Your preexisting knowledge or experience with the language A specific use case where the language is commonly used The language(s) your colleagues or coworkers use You can find a description of other languages you might encounter here. Let’s ask Claude-instant again for some recommendations on the appropriate languages for our proposed blackjack game from the above example. (Pro tip: the various AI bots all keep track of your previous queries, so you don’t have to restate your goal for the software if you’re continuing the conversation.) What would be a good coding language to create this game of blackjack? Claude-instant actually gave us more information than we anticipated. If it hadn’t stated pros and cons while offering language suggestions, we could ask for them directly. This could be helpful if you’re deciding between two options like, say, Python or Java. What are the pros and cons of using Python vs Java for this project? 3.11 Hands On Exercise Now it’s your turn to try! 3.11.1 Code Here’s the code you need for question #2. def calculate_average(numbers): total = 0 for number in numbers: total += number average = total / len(numbers) return average numbers = [1, 2, 3, 4, "5"] average = calculate_average(numbers) print("The average of the numbers is:", average) 3.11.2 Questions Write a prompt to create a function that replaces all instances of the string “three” with the number 3, then create a prompt that allows you to modify the function so that “three”, “Three”, and “THREE” are all replaced with the number 3. Write a query that helps you debug the code snippet from above. Create a query that explains how to retrieve data from an online database. Don’t forget to find out what the source of the code is. Write a prompt to plan out a program that creates and plays a game of Rock, Paper, Scissors. You should create a multi-step plan as well as figure out what your first coding task should be. Create a prompt that identifies the best coding language for the above game. 3.12 Summary Coding has become an essential skill in today’s world. AI can increase the efficiency of planning, writing, and debugging code, as well as make coding more accessible for beginners. Querying AI on code writing tasks requires patience. AI cannot read your mind, and users need to be both specific and patient when asking for coding help. It is essential to test all AI-written code while in development, since inadequate testing could lead to low quality, buggy code. A programmer is responsible for what they create while coding with AI, even if AI writes the actual code. devtools::session_info() ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## curl 4.3 2019-12-02 [1] RSPM (R 4.0.3) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## highr 0.8 2019-03-20 [1] RSPM (R 4.0.3) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.3) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## ottrpal 1.0.1 2023-03-28 [1] Github (jhudsl/ottrpal@151e412) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["refactoring-code.html", "Chapter 4 Refactoring Code 4.1 Learning Objectives 4.2 Refactoring Basics 4.3 AI in Code Refactoring 4.4 Example One: Correcting Syntax 4.5 Example Two: Styling and Convention 4.6 Example Three: Staying DRY 4.7 Example Four: Keeping Code Alive 4.8 Example Five: Making Code Readable 4.9 Example Six: Keep it Brief 4.10 Code optimization 4.11 Challenges and Limitations 4.12 Hands-On Exercise 4.13 Summary", " Chapter 4 Refactoring Code 4.1 Learning Objectives Describe how refactoring code involves optimization for maintainability, efficiency, and reuse Explain why refactoring code is important for developers in the long-term Recognize the benefits and limitations of using AI tools to refactor code, as well as why AI tools are uniquely poised to be beneficial Implement prompt strategies that can be used to assist with refactoring code for correcting syntax, for adopting more consistent styling, for making code more concise, for making code easier to maintain, and for making code more efficient 4.2 Refactoring Basics Code refactoring is the process of improving the quality of underlying code without changing its functionality. In other words, it’s a way of cleaning up and optimizing code so that it’s easier to maintain and more efficient. This often involves making small changes to the code, such as renaming variables or functions, reorganizing code blocks, or simplifying complex expressions. Refactoring is an essential practice in software development and helps to ensure that the codebase remains manageable and adaptable as requirements and business needs change over time. Code refactoring helps to reduce technical debt, which is the accumulation of development work that needs to be done in the future as a result of taking shortcuts or using less than optimal solutions. Refactoring can help to identify potential issues and bugs before they cause problems, resulting in more stable software. Finally, refactoring code to be more readable and less verbose can create fewer headaches for you in the future. To understand why code refactoring is important, imagine trying to cook in a disorganized and cluttered kitchen. You might be a fantastic cook, but you are spending more time searching for utensils and ingredients, and the quality of the meal could suffer as a result. Similarly, code that is disorganized and cluttered can be difficult to work with and can result in lower-quality software. Refactoring is like cleaning and organizing the kitchen - it makes the process of cooking (or coding) smoother and more efficient. 4.3 AI in Code Refactoring Code refactoring has historically been done manually by developers. This involves reviewing code and identifying areas that could be improved or optimized, and then making changes to the codebase accordingly. Though important, this is process is time-consuming and labor-intensive, as it requires developers to carefully review every line of code to identify potential issues or areas for improvement. Additionally, manual code refactoring is error-prone, as developers can accidentally introduce bugs or errors into the codebase while making changes. However, AI has significant potential to help with code refactoring. AI can use machine learning algorithms to analyze large amounts of code and identify patterns or areas that could be improved. For example, they can identify sections of code that are redundant, overly complex, or difficult to maintain, and suggest changes that could be made to improve the codebase. Machine learning algorithms can also help to identify potential bugs or security issues in the codebase, which can help to improve the overall quality and stability of the software. AI refactoring is also faster and more accurate than manual refactoring. This is particularly useful for large-scale software projects with massive codebases, where manual code review and refactoring can be an enormous task. In the next sections, we’ll take a look at some examples of using AI to refactor code. The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. 4.4 Example One: Correcting Syntax Programming languages operate using their established set of rules. For example, Python uses indentation to define code blocks. However, when you shift between development environments, indentation and line breaks can sometimes be affected, causing major headaches! Let’s say we run the following code: def calculate_sum(a,b): result = a + b return result def calculate_product(a,b):result = a * b return result def main(): a=5 b=6 sum_result = calculate_sum(a, b) print("Sum:", sum_result) product_result = calculate_product(a,b) print("Product:", product_result) if __name__ == '__main__': main() ## unexpected indent (<string>, line 3) We get the error above because our indentation is off. Let’s ask ChatGPT to clean things up. Can you assist me with correcting the formatting and indentation issues in my Python code? Of course! Here is the code after correcting the formatting and indentation: def calculate_sum(a, b): result = a + b return result def calculate_product(a, b): result = a * b return result def main(): a = 5 b = 6 sum_result = calculate_sum(a, b) print("Sum:", sum_result) product_result = calculate_product(a, b) print("Product:", product_result) if __name__ == '__main__': main() ## Sum: 11 ## Product: 30 Much better! This can be quite handy when you can’t immediately see what is wrong with the formatting or if you are working with a large codebase. 4.5 Example Two: Styling and Convention Sometimes programming languages have more flexible rules where there are multiple ways to take the same or almost the same action. For example, in the R programming language, there are several ways to perform variable assignment. However, is is generally accepted among R programmers that the <- operator should be used. Single and double quotes can be used for character types interchangeably, but it’s generally a good idea to stick with one or the other. 4.5.1 Commonly Accepted Convention Let’s say we have the following code: a <- "apple" b = 'banana' "carrot" -> c c(a, b, c) ## [1] "apple" "banana" "carrot" This code does not produce any errors, but it doesn’t follow typical R convention. Let’s ask ChatGPT to clean things up. Can you help me reformat my R code to follow accepted style conventions? Sure, here’s the cleaned-up R code: a <- "apple" b <- "banana" c <- "carrot" Much better! This code will now look more familiar to other R programmers. 4.5.2 Visual Styling Some languages are flexible about line breaks, indentation, and spacing among code blocks. If you’re rapidly developing, things can get messy pretty quickly. Many development environments, for example RStudio and PyCharm, have the ability to format your code for you without changing any of the code itself. However, you can also get AI to do this according to your own preferences. d <- c(rep(a,5),rep(b,5),rep(c,5),paste(a,b,c),paste(c,b,a)) This code is pretty hard to read. Let’s see if AI can do better. Can you space out this R code to make it more readable? Sure! Here’s the same code with added spacing for improved readability: d <- c(rep(a, 5), rep(b, 5), rep(c, 5), paste(a, b, c), paste(c, b, a)) Better! It’s easier to interpret because only one action is being taken on each step. By spacing out the code and breaking it into multiple lines, it becomes easier for the human eye to scan for structure, logic, errors, and/or inconsistencies. 4.6 Example Three: Staying DRY The DRY (Don’t Repeat Yourself) programming tenet is essential to good software. By avoiding repetition in code, we can reduce the amount of duplicate code we need to write and maintain. If we need to change the functionality of a certain piece of code, we have to make the same change in multiple places, which can be time-consuming and error-prone. Let’s take the following Python code: def print_greetings(): print("Hello, Alice!") print("Hello, Bob!") print("Hello, Carol!") print("Hello, David!") print("Hello, Eve!") print_greetings() ## Hello, Alice! ## Hello, Bob! ## Hello, Carol! ## Hello, David! ## Hello, Eve! Notice how repetitive this code is? print() and the string \"Hello, \" appear five times. We can improve this! Could you please refactor my Python code to make it more DRY? Sure, here’s a refactored version of your code: def print_greeting(name): print(f"Hello, {name}!") def print_greetings(): names = ["Alice", "Bob", "Carol", "David", "Eve"] for name in names: print_greeting(name) print_greetings() ## Hello, Alice! ## Hello, Bob! ## Hello, Carol! ## Hello, David! ## Hello, Eve! Now let’s imagine your software lead has asked us to change the greeting to Spanish. Notice how we only have to change the second line of code: def print_greeting(name): print(f"¡Hola, {name}!") def print_greetings(): names = ["Alice", "Bob", "Carol", "David", "Eve"] for name in names: print_greeting(name) print_greetings() ## ¡Hola, Alice! ## ¡Hola, Bob! ## ¡Hola, Carol! ## ¡Hola, David! ## ¡Hola, Eve! Much easier. 4.7 Example Four: Keeping Code Alive During software development, code is often changing rapidly. It’s common to end up with functions or variables you don’t need anymore. It’s best practice to remove “dead” code from your codebase: it’s distracting and can make maintenance more challenging. Programmers should resist the urge to hang on to unused code in the expectation that they might need it one day. Version control, such as provided by git and GitHub, let you keep a record of old code without cluttering your final version. Let’s take a look at some Python code with some dead code: def default_values(): a = 5 b = 6 return a, b def calculate_sum(a, b): result = a + b return result def calculate_product(x, y): return x * y def simulate_data(): a, b = default_values() dat = calculate_sum(a, b) + calculate_sum(a, b) return dat def main(): a = 5 b = 6 a, b = default_values() sum_result = calculate_sum(a, b) print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 Some of these functions aren’t used in the final step, typically defined in python as the main() function. Let’s ask AI to remove them. I need help with eliminating unused functions in my Python script. Sure! Here’s a refactored version: def default_values(): a = 5 b = 6 return a, b def calculate_sum(a, b): result = a + b return result def main(): a = 5 b = 6 a, b = default_values() sum_result = calculate_sum(a, b) print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 Great! Now our code is much more manageable. This code only includes the necessary functions needed to run the main() function. We can go a step further! Can you help me refactor my python code to make it more concise? Certainly! Here’s a more concise version of your code: def calculate_sum(a=5, b=6): return a + b def main(): sum_result = calculate_sum() print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 In this version, the calculate_sum function is simplified to take default arguments for a and b. This means the default_values function is no longer necessary. The a and b variables are also removed from main, as they are no longer needed. Much better and easier to look at overall. 4.8 Example Five: Making Code Readable Usually, our first goal as programmers is to get the response we need from the code. Readability might be the last thing on our minds. Luckily, AI can help with this. Let’s imagine we whipped up this code pretty quickly. It is tough to read, and it could definitely be improved. for(term in c("good","bad","ugly")){ print(c(rep(term,3),paste("really",stringr::str_to_upper(term)))) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" Let’s see if AI can do better. Can you help me clean up this R code so it is more readable? Sure! Here’s a more readable version of the code: library(stringr) for(term in c("good", "bad", "ugly")) { print(c(rep(term, 3), paste("really", str_to_upper(term)))) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" I added whitespace around operators and after commas for better readability. I also loaded the stringr package explicitly using library() to make the code more self-contained. This is a good start, but not much has changed. Let’s try again. On ChatGPT, we can click “Regenerate Response”. Here’s a cleaned-up version of the R code that is more readable: library(stringr) terms <- c("good", "bad", "ugly") for (term in terms) { rep_term <- rep(term, 3) upper_term <- str_to_upper(term) really_upper_term <- paste("really", upper_term) result <- c(rep_term, really_upper_term) print(result) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" I’ve stored the repeated term and the uppercased term in separate variables to make the code more readable. I’ve also separated the construction of the “really” phrase from the uppercased term to make it clearer what’s happening. Finally, I’ve used whitespace and indentation to make the structure of the code more apparent. Better! While the code takes up a greater number of lines, it’s easier to interpret because only one action is being taken on each step. 4.9 Example Six: Keep it Brief As we learned above, following the DRY principle and removing dead code can make your code easier to interpret, maintain, and debug. However, sometimes the issue is the functions themselves. No one has perfect knowledge of everything in their programming language. AI can help us determine if there are any more efficient sets of functions that do the same thing. Here is some code that plays with the “CO2” dataset in R: library(datasets) for (type in unique(CO2$Type)) { dat_subset <- CO2[CO2$Type == type, ] for (plant in unique(dat_subset$Plant)) { dat_subset_2 <- dat_subset[dat_subset$Plant == plant, ] print(paste(dat_subset_2$Plant[1], round(mean(dat_subset_2$uptake), 1))) } } ## [1] "Qn1 33.2" ## [1] "Qn2 35.2" ## [1] "Qn3 37.6" ## [1] "Qc1 30" ## [1] "Qc2 32.7" ## [1] "Qc3 32.6" ## [1] "Mn1 26.4" ## [1] "Mn2 27.3" ## [1] "Mn3 24.1" ## [1] "Mc1 18" ## [1] "Mc2 12.1" ## [1] "Mc3 17.3" Let’s see if AI has any ideas for making it more concise. Can you make this code more concise? Here’s a more concise version using dplyr package in R: library(dplyr) CO2 %>% group_by(Type, Plant) %>% summarize(mean_uptake = mean(uptake)) ## # A tibble: 12 × 3 ## # Groups: Type [2] ## Type Plant mean_uptake ## <fct> <ord> <dbl> ## 1 Quebec Qn1 33.2 ## 2 Quebec Qn2 35.2 ## 3 Quebec Qn3 37.6 ## 4 Quebec Qc1 30.0 ## 5 Quebec Qc3 32.6 ## 6 Quebec Qc2 32.7 ## 7 Mississippi Mn3 24.1 ## 8 Mississippi Mn2 27.3 ## 9 Mississippi Mn1 26.4 ## 10 Mississippi Mc2 12.1 ## 11 Mississippi Mc3 17.3 ## 12 Mississippi Mc1 18 This code groups the CO2 dataset by Type and Plant, calculates the mean uptake for each group using summarize, and then prints the resulting data frame. The code is shorter and more concise because it uses the dplyr package, which has built-in functions for data manipulation and summarization. Refactoring the code to be more concise using dplyr makes it easier to read and understand the steps involved in the data manipulation. It also reduces the number of nested loops and intermediate variables, making the code more efficient and reducing the chance of errors. Ultimately, it is your job as a programmer to use your best judgment on the fine balance between brevity and readability, as the best solution is not always this obvious. Remember that you can regenerate the AI’s response if you want more options. 4.10 Code optimization Sometimes time and efficiency is of the essence. Loops can be slow and time consuming in many languages. Take for example the following R code that calculates the cross product between two vectors x and y using nested for loops: set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) z <- numeric(n) # Start timer start_time <- proc.time() for(i in 1:n) { for(j in 1:n) { z[i] <- z[i] + x[i]*y[j] } } # End timer proc.time() - start_time ## user system elapsed ## 12.039 0.001 12.041 What is a faster, more optimized way of running the following R code? set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) # Start timer start_time <- proc.time() z <- outer(x, y, "*") z <- rowSums(z) # End timer proc.time() - start_time ## user system elapsed ## 0.650 0.356 0.680 The outer() function performs the same calculation as the nested loop in the original code, but more efficiently. It returns a matrix of all possible combinations of x and y values, with each element of the matrix being the product of the corresponding x and y values. The rowSums() function is then used to sum the elements of each row of the matrix, which is equivalent to summing the products of x and y for each index i in the original loop. This method avoids the need for the nested loop, resulting in a faster and more efficient computation. So much faster! We can look at another option by regenerating the response: set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) z <- numeric(n) # Start timer start_time <- proc.time() z <- x %*% t(y) # End timer proc.time() - start_time ## user system elapsed ## 0.334 0.284 0.396 One optimized way to perform the same calculation is by using the %*% operator to perform matrix multiplication. This can be done by converting x and y to matrices and transposing one of them so that their dimensions align for matrix multiplication. This code should be much faster than the original implementation because it takes advantage of highly optimized matrix multiplication algorithms in R. While this second suggestion is faster, you will need to consider what aspects of the codebase are most important in each instance. For example, this code runs more quickly, but the %*% operator might be unfamiliar to some R programmers. In cases where efficiency is less important, or the data are not large, you might consider maximizing readability. 4.11 Challenges and Limitations Like humans, AI can make mistakes. Be sure to keep the following in mind as you use AI to refactor your code. 4.11.1 Complexity Refactoring is often a complex process that requires deep understanding of the code and its context. AI may not always be able to fully understand the complexity of the code and may struggle to identify the best refactoring strategy. Use a modular approach whenever possible. 4.11.2 Limited Data AI models require large amounts of data to learn from, but in the case of code refactoring, there is often limited data available. This can make it difficult for AI models to generalize to new code bases and situations, especially if you are using a more niche programming language. 4.11.3 Quality Control Automated refactoring tools that use AI may not always produce code that is of the same quality as code produced by human developers. It can be difficult to always ensure that the refactored code is maintainable, efficient, and free of bugs. You need to use your best judgment when copying and pasting AI-produced code into your codebase. You should always include unit tests in your code. Tests can help you catch bugs, including those introduced accidentally by AI. Because AI models are created by humans, they can be biased. This means they may not always identify your preferred refactorings or may prioritize certain types of refactorings over others. In some cases, this can lead to suboptimal code quality and may create technical debt over time. 4.11.4 Security When using AI to refactor code, the code itself is often sent to an external service or platform for analysis and transformation. This can raise concerns about the security of the code, especially if it contains sensitive information such as trade secrets, proprietary algorithms, or personal data. If your code is sensitive, it’s important to carefully vet any third-party AI tools or services used in the refactoring process. 4.12 Hands-On Exercise Now it’s your turn to try. 4.12.1 The Code Let’s say you are dusting off some code from your past (no judgment here). You were investigating tweets about Mr. Trash Wheel, a beloved Baltimore-based contraption that filters trash out of the waterways. Note: This code is just an example and was written strictly for educational purposes. import tweepy import pandas # Enter your API keys and access tokens here consumer_key = 'your_consumer_key' consumer_secret = "your_consumer_secret" access_token = 'your_access_token' access_token_secret = 'your_access_token_secret' # Authenticate with Twitter API auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q="Mr. Trash Wheel", tweet_mode='extended').items(100) # Create empty list to store tweet data date_data = [] location_data = [] text_data = [] def get_tweet_length(tweet): # Return the length of the tweet text return len(tweet.full_text) # Loop through each tweet and extract desired data for tweet in tweets: date_info = { 'date': tweet.created_at } date_data.append(date_info) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q="Mr. Trash Wheel", tweet_mode='extended').items(100) # Loop through each tweet and extract desired data for tweet in tweets: location_info = { 'location': tweet.user.location } location_data.append(location_info) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q='Mr. Trash Wheel', tweet_mode='extended').items(100) # Loop through each tweet and extract desired data for tweet in tweets: text_info = { 'text': tweet.full_text } text_data.append(text_info) # Combine lists into a dictionary data = {'date': date_data, 'location': location_data, 'text': text_data} # Store results in pandas dataframe df = pandas.DataFrame(data) # Print dataframe print(df) 4.12.2 Questions Create an AI prompt that fixes any formatting issues with the code that would cause it not to run. Devise an AI prompt that removes any dead code from your sample above. What gets removed? Create a prompt that makes the code less repetitive, adhering to the DRY principle. What aspect of the code was repetitive? Construct a prompt that makes the code more concise. What are some trade-offs that appear in this code between readability and brevity? 4.13 Summary Code refactoring is the process of improving code quality without changing its functionality. It is crucial in software development to maintain a manageable and adaptable codebase. Code refactoring reduces technical debt, improves code stability, and makes it easier to maintain. Examples of using AI for code refactoring include correcting syntax, adhering to styling and convention, visual styling, avoiding repetition, removing dead code, and improving both readability and speed of execution (optimization). The use of AI for code refactoring raises ethical concerns and is not perfect. It is important for the developer to consider security needs of their code, as well as test out their code. "],["annotating-your-code.html", "Chapter 5 Annotating Your Code 5.1 Learning Objectives 5.2 Annotating Code is Good Practice 5.3 Improves readability 5.4 Improves maintainability 5.5 Improves the quality 5.6 Examples of quality aiding comments 5.7 AI in Code Annotation 5.8 Ethics of using AI to annotate code 5.9 Examples on using AI for annotation 5.10 Summary", " Chapter 5 Annotating Your Code 5.1 Learning Objectives Explain the significance and benefits of code annotation and how it aids in understanding and working with code in the future. Demonstrate the ability to apply effective code annotation practices by providing clear and concise comments in code. Recognize the impact of code annotation on code maintainability. Explain how well-annotated code can facilitate collaboration among developers and ensure code consistency. Critically evaluate the use of AI in code annotation, including benefits, ethical considerations, and limitations. Explore how AI can be used to create README files, do line-by-line annotations, and offer potential code improvements. 5.2 Annotating Code is Good Practice Code annotation, also known as code commenting, is the process of adding explanatory notes to source code. These notes are used to provide context, clarify functionality, and aid in understanding for developers who may be working on the codebase in the future. Code annotation is an important practice for any developer looking to write clear, efficient, and maintainable code. There are many benefits to annotating code: Improves readability Improves maintainability Improves quality The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. 5.3 Improves readability First, it can help to improve the readability of code. By adding comments, developers can explain the purpose of each section of code, which can make it easier for others to understand what the code is doing. This can be especially helpful for large or complex codebases, where it can be difficult to keep track of all the different components. This is particularly important in contexts where the original developer of a project may move on to something else and others are left to work on them. Annotation perhaps most often helps a developer remember things they knew about the code when they were originally writing it (annotation is helpful for future you!) If the original developer left well-annotated code, it can drastically improve the ability of others (and their future selves) to continue with the project. 5.3.1 Examples of readability aiding comments Comments that help readability clarify what the code is doing Function explanations: A comment at the beginning of a function or method can describe its purpose, parameters, and expected return values. This makes it easier for others to understand what the function does and how to use it. # This function calculates and returns the sum of two numbers (x and y) def add_numbers(x, y): return x + y Inline explanations: Use inline comments to explain what a specific line or block of code does. This can help someone reading your code to quickly understand what’s going on. # Loop through each item in the list for item in my_list: # Check if the item is greater than 10 if item > 10: # Print the item print(item) TODO comments: Use TODO comments to mark areas of your code that need further work or improvement. This can help you or others to remember to come back to a specific part of the code. # TODO: Implement error handling for this function def my_function(): pass Parameter descriptions: If a function or method has complex parameters, it can be helpful to add comments explaining what each parameter does. def my_function(parameter1, parameter2): """ Calculate the sum of two numbers. Parameters: parameter1 (int): The first number to be added. parameter2 (int): The second number to be added. Returns: int: The sum of parameter1 and parameter2. """ return parameter1 + parameter2 Code block summary explanations: If you have a long or complex code block, you can add a comment to explain what the block is doing. # This code block creates a dictionary containing the counts of each word in a list of text strings word_counts = {} for text in text_list: for word in text.split(): if word not in word_counts: word_counts[word] = 0 word_counts[word] += 1 5.4 Improves maintainability Code annotation can help to improve the maintainability of code. By adding comments, developers can explain the reasoning behind certain decisions, which can make it easier for others to make changes to the code without breaking it. This can be especially helpful when multiple developers are working on the same codebase, as it can help to prevent conflicts and ensure that the code is always in a consistent state. 5.4.1 Examples of maintainability aiding comments Comments that help maintainability explain the historical context of why code was made the way it was Examples of maintainability aiding comments Design decisions: If there were specific design decisions made when creating the code, you can add comments explaining why certain choices were made. bash # We chose to use a linked list data structure for this function to reduce the time complexity of inserting and deleting elements. Legacy code: Sometimes, code may have been written in a certain way due to constraints or limitations at the time it was created. Adding comments to explain this can help others understand why the code is the way it is. # This code was written before Python 3, which introduced the `yield from` syntax. Therefore, we used a `for` loop to iterate over the nested list. Performance optimizations: If certain performance optimizations were made to the code, you can add comments explaining why they were necessary. bash # We used memoization to improve the time complexity of this recursive function, as it was taking too long to execute for larger inputs. Compatibility considerations: If the code was written with compatibility considerations in mind, you can add comments explaining why certain choices were made. # We used the `os.path` module to ensure that this code will work on both Windows and Unix-based systems, as the path separators are different on each platform. Limitations: If there are limitations or edge cases that the code cannot handle, you can add comments to explain this to others. # Note that this function assumes that the input array is sorted in ascending order. If the array is unsorted, the results may be incorrect. 5.5 Improves the quality Code annotation can help to improve the quality of code. By adding comments, developers can identify potential issues or edge cases, which can help to prevent these issues from occurring in the first place. This can be especially helpful when testing code, as it can help to ensure that the code is working as expected. 5.6 Examples of quality aiding comments Comments that help improve the quality of the code by explaining to others how to use it or help improve it Error handling: Adding comments to explain how and why error handling is being implemented can help ensure that your code is robust and able to handle unexpected inputs or errors. This helps others know how the code was intended to be used. # If the input argument is not a list or is empty, raise a ValueError if not isinstance(input_list, list) or len(input_list) == 0: raise ValueError("Input must be a list that is not empty") Complexity: If your code has particular complexities, adding comments that explain it can help others understand the performance characteristics of your code. It may help others identify whether there is a simpler way to write the code. By adding a comment that expresses uncertainty about the code and asking for suggestions, the author can potentially receive feedback from others on how to improve the code. # This block of code could probably be simplified, but I'm not sure how. new_list = [] for i in old_list: if i > 0: new_list.append(i) Constants and variables: Adding comments to explain the purpose of constants and variables can make your code easier to use. It not only notifies others of the variables existence but lets them know if they need to change the parameters for their own purposes. # This constant represents the maximum allowed number of retries when attempting to connect to the server. MAX_RETRIES = 3 # This variable tracks the number of failed attempts to connect to the server. num_retries = 0 Code organization or disorganization: Adding comments to explain the organization and structure of your code can make it easier for others to navigate and may bring out a better organizational strategy for your code. # Define helper functions for data processing def preprocess_data(input_data): # Implementation details def analyze_data(processed_data): # Implementation details def visualize_data(analyzed_data): # Implementation details # Load data from file input_data = load_data("input.txt") # Preprocess data processed_data = preprocess_data(input_data) # Analyze data analyzed_data = analyze_data(processed_data) # Visualize data visualize_data(analyzed_data) Assumptions and constraints: Adding comments to explain the assumptions and constraints of your code can help others understand its limitations and potential pitfalls. # This function assumes that all input values are positive integers. Negative or non-integer inputs may result in unexpected behavior. 5.7 AI in Code Annotation We’ve discussed why code annotation and documentation are important, but when and why would you use AI for code annotation? AI can be a nice tool to supplement the annotation of your code. It should not be the only source of annotation for your code, because as the code author, you need to verify that what AI has annotated is consistent with your knowledge and history of development of the code. But using AI to annotate your code can be useful to supplement existing code annotations or to annotate old code that was poorly annotated either by yourself or others who are no longer working on the project. Here are some of the benefits of using AI for code annotation: Speed and efficiency: AI algorithms can analyze code much faster than humans, which means they can quickly generate comments and annotations for large codebases. This can save developers significant time and effort, allowing them to focus on other aspects of the development process. Consistency: Unlike humans, AI is not affected by personal biases or preferences, so it can provide consistent annotations across different code files and projects. A human may underestimate places in the code that should have annotation, whereas an AI might be more consistent at putting annotation in these places. This can help ensure that all code in a project is well-documented and easy to understand. Objectivity: AI can analyze code objectively and identify potential issues that may have been missed by humans. This can lead to better quality code that is easier to maintain and less prone to errors. Learning: AI models can learn from large datasets of annotated code and improve their ability to generate comments and annotations over time. This means that the more code the AI model analyzes, the more accurate and effective it becomes at annotating code. Specificity: AI models can be trained on specific programming languages, frameworks, or libraries, allowing them to generate language-specific comments and annotations that are tailored to the needs of the project. This can be particularly helpful for large, complex projects that require specialized knowledge or expertise. 5.8 Ethics of using AI to annotate code However, there are also a number of potential ethical concerns associated with using AI to annotate code. For example, we don’t yet know how and in what ways AI models may be biased. Additionally, AI models may be opaque, which could make it difficult to understand why they made certain annotations. Finally, because AI models are not humans and don’t necessarily tell the stories behind the code, they may be used to generate annotations that are not accurate, helpful, or do not tell the full depth of the history of what occurred with the code, which could lead to problems. Given these potential benefits and concerns, it is important for users of AI to always realize that they are ultimately accountable for the annotation that an AI model makes, and careful review of this annotation is needed. There are a number of ethical considerations to take into account when using AI to annotate code. Here are a few of the most important considerations: Accountability: First and foremost, a user of AI is always primarily accountable for any output that they use from an AI model. AI models can give you annotation to start with, but it is up to you as the user to verify and review this output carefully. The user of the AI model is ultimately responsible for keeping or throwing out the annotations the AI makes and thus is responsible for using errors that the model makes. Much like a user of a Google Search engine is responsible for which results they use, a user of AI is responsible what output from the AI model they use. Transparency: It is important to be transparent about the use of AI in code annotation. This means disclosing the fact that AI is being used, as well as the specific AI model and dataset that is being used. This should be stated on every file where annotation has been made using AI. This transparency allows others who view the code to be able more fully interpret the AI-created annotation that accompanies the code. Bias: AI models are trained on data, and this data can introduce bias into the model. It is important to be aware of the potential for bias and to take steps to mitigate it. This can be done by using a diverse dataset, by carefully selecting the features that are used to train the model, and by using techniques such as adversarial training. It is important to provide annotations that indicate any known biases, possible limitations associated with bias, and any strategies that were used to mitigate bias. Explainability: It is important to be able to explain the decisions that an AI model makes. This is especially important in the case of code annotation, where the decisions made by the AI model can have a significant impact on the quality of the code. There are a number of techniques that can be used to explain the decisions of an AI model, such as visualization and rule extraction. You can (and should) ask a chatbot to explain its sources and rationale for output that it gave. (Be aware that actual citations it gives may or may not be accurate, and you need to verify the accuracy of those citations by doing your own follow up literature search). 5.9 Examples on using AI for annotation 5.9.1 Ask for a README summary To try out some annotation with code examples, this author dug up some old code that she had no idea about what it did anymore because time had passed and she didn’t document it very well. First, I wanted ChatGPT to make a README for me. README’s are a standard file that explain what the code base does and provides a summary of what is happening in the set. I asked ChatGPT if it could: Create a README for this code and then I pasted the old code. ChatGPT gave me some output that generally looked like a README: This README seems accurate given the work I was doing at the time. But obviously the title isn’t great – it doesn’t describe the code well. Plus there’s not really information about usage or outputs of the code. So I asked it: Can you make this README more succinct and with sections about how you would use this code? This yielded a much better result. 5.9.2 Ask for line by line annotation Second, I wanted to have ChatGPT annotate code line by line. So I asked it Can you annotate some R code for me? It not only provided helpful annotations but it fixed spacing and added more spaces between lines of code: It does however, think the code is Python apparently if we look in the upper left corner. But this doesn’t seem to have affected its accuracy in its annotations. 5.9.3 Ask for how the code could be improved This code is old, and I don’t think it runs properly and is in no way reproducible. So I wanted to see what ChatGPT would recommend for improvements to the code. I said How could I make this code better? It had a lot of great advice: First and foremost it told me my variable names are not consistent and also not descriptive. Which is completely accurate! There’s variable names like xx and ll which don’t tell the reader anything about the variable. 5.9.4 Style the code Next, I asked chatGPT if it could restyle the code into tidyverse style. It definitely put elements that look like tidyverse however, it also put items in the code that I can tell by looking at it won’t work. So I tried to tell it about the errors: :::{.query} This code is close but there are some errors. These tidyverse functions don’t work with matrices. ::: and it told me: You are correct that the tidyverse functions are designed to work with data frames and not matrices. Here's a modified version of your code that converts the matrix to a data frame and uses tidyverse functions: The lesson here is that the output of chatGPT still needs to be vetted by the person asking for it. It is a great idea to continue to have AI’s work on something if it isn’t quite what you are asking for, however, at the end of the day it is you, the human, who has to vet the output. 5.9.5 Sometimes it has trouble with file types if given a URL Copying and pasting code appears to be the best way to go at this point in time. I asked chatGPT if it could annotate code in a Jupyter notebook at a URL I gave it: Can you annotate the code at this URL? https://github.com/jhudsl/reproducible-python-example/blob/main/make_heatmap.ipynb It responded with some of the code at that file where it was annotated but most of the notebook was not there. I told ChatGPT: That isn’t all the code at that url ChatGPT told me: You're right, I apologize. The code at the given URL is actually in a Jupyter Notebook format, so it's not easy to copy-paste here. However, I can suggest an alternative approach: you can use the website nbviewer (https://nbviewer.jupyter.org/) to view the notebook and its code. Here are the steps to do so: Go to https://nbviewer.jupyter.org/ Paste the URL of the notebook https://github.com/jhudsl/reproducible-python-example/blob/main/make_heatmap.ipynb into the "Enter the location of a Jupyter Notebook" field and press "Go". This will display the notebook and its code, and you can read through it and add your own annotations as needed. I hope this helps! This was a nice tool to recommend, however it still told me I should add my own annotations at the end of this. 5.10 Summary Code annotation is important for readability, maintainability, and overall quality of code AI can be a helpful tool for supplementing annotations or creating annotation for poorly annotated code. Ultimately the code author who uses AI as a tool for annotation is still responsible for verifying that the annotations are consistent with their knowledge of the code. devtools::session_info() ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## curl 4.3 2019-12-02 [1] RSPM (R 4.0.3) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## highr 0.8 2019-03-20 [1] RSPM (R 4.0.3) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.3) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## ottrpal 1.0.1 2023-03-28 [1] Github (jhudsl/ottrpal@151e412) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["understanding-unfamiliar-code.html", "Chapter 6 Understanding Unfamiliar Code 6.1 Learning Objectives 6.2 Reading Unfamiliar Code Is A Skill 6.3 AI in Reading Code 6.4 Example One: Summarizing Code You Didn’t Write 6.5 Example Two: Identifying a Coding Language 6.6 Example Three: Interpreting Regex 6.7 Example Four: Demystifying Functions 6.8 Limitations 6.9 Hands-On Exercise 6.10 Summary", " Chapter 6 Understanding Unfamiliar Code 6.1 Learning Objectives Recognize the benefits, limitations, and assumptions in using AI to understand unfamiliar code Consider the ethical questions around using AI to understand another person’s code Practice using AI to summarize code, identify an unknown coding language, and interpret regex and unknown functions 6.2 Reading Unfamiliar Code Is A Skill As a programmer, you will frequently encounter situations where you need to read and understand code written by other developers. This could be because you are collaborating on a project, you have taken over maintenance of an existing codebase, or you are learning new syntax or skills that require reading example codes. Comprehending code written by someone else is a distinct skill from being able to write your own code, and it takes practice to become proficient at it. Reading unfamiliar code is like exploring an unfamiliar city without a map or a guide. Just like in a new city, you may not know where to start or how to navigate the codebase. You may encounter unfamiliar syntax, functions, and libraries that you have never seen before, just as you might encounter new streets, buildings, and landmarks. At first, you may feel disoriented and overwhelmed, and may need to spend some time getting oriented and familiarizing yourself with the environment. As you explore, you may start to see patterns and similarities, just as you might begin to recognize neighborhoods and landmarks in a new city. You may also encounter dead-ends, confusing intersections, and unexpected detours, just as you might encounter bugs and errors in the code. To make progress in this unfamiliar territory, you may need to rely on a combination of intuition, deduction, and experimentation. You may need to break down the code into smaller pieces, analyze the behavior of each piece, and gradually build up a mental model of how the code works. You may also need to consult documentation, online resources, or other experts who are familiar with the codebase, just as you might ask locals or consult a map in a new city. In the end, with patience, persistence, and a willingness to learn, you can successfully navigate the unfamiliar code and achieve your goals, just as you can successfully explore a new city and discover its hidden treasures. Learning how to read and understand code written by someone else is a valuable skill that can help you be a more effective programmer. By being able to read and comprehend unfamiliar code, you can contribute to projects more quickly, diagnose and fix bugs more effectively, and learn new programming techniques and best practices. As with any skill, the more you practice reading and understanding code written by others, the more comfortable and proficient you will become at it. 6.3 AI in Reading Code While anyone can learn to understand code written by someone else, using AI language models to achieve this has three significant advantages. Summarize large codebases. AI can quickly scan large codebases and give you a high-level overview of what they do. This can save you a lot of time and effort, especially when dealing with complex codebases. For example, AI can identify the main functions and classes in a codebase, and it can show you how they are related. Analyze the code. AI can look at the code and tell you about its structure, complexity, and potential defects quickly and efficiently. This can help you understand the code better and make it easier to maintain and improve. For example, AI can identify parts of the code that are difficult to understand or maintain, and it can suggest ways to improve the code. Understand comments and documentation. AI can read comments and documentation written by other developers. This can help you understand the code’s purpose and the assumptions and limitations of the codebase. For example, AI can identify the key assumptions that are made in the code, and it can explain the trade-offs that were made in the design of the code. Keep these ethical considerations in mind when using AI to summarize code. Who owns the code? If you are not the owner of the code, you should get permission from the owner before summarizing it. What is the purpose of the summary? If you are summarizing the code for educational or research purposes, it is likely ethical to do so. However, if you are summarizing the code for commercial purposes, you may need to get permission from the owner. Does the code contain any sensitive or personal information? If the code contains any sensitive or personal information, you should take special care to protect that information. The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. 6.4 Example One: Summarizing Code You Didn’t Write Here’s some simple code using the R programming language, written by an unnamed programmer. If you are familiar with R, you might be able to understand what this code does with little effort. However, if you don’t know R, you’re relatively new to R, or you’re just having a bad day, understanding what this bit of code can take a great deal more energy and time. AI language bots can ease that burden. library(tidyverse) d <- ggplot2::diamonds d_sel <- d %>% select(carat, cut, color, clarity, depth, table, price) d_fil <- d_sel %>% filter(price > 5000) d_bin <- d_fil %>% mutate(carat_bins = cut_width(carat, width = 0.2)) d_summ <- d_bin %>% group_by(carat_bins, cut, color, clarity) %>% summarize(mean_price = mean(price), sd_price = sd(price)) We first ask Claude-instant (as implemented by Poe) to explain the purpose of this code. What does this code do? You can ask for Claude to summarize each step in detail as well. Can you explain what each step does in detail? Maybe you’ve never worked with the tidyverse package in R - Claude can tell you about it. What is the tidyverse package? You might also not be familiar with the diamonds dataset (especially if you’ve never used it before) and want some clarification on where the data are from. Can you tell me more about the diamonds dataset? 6.5 Example Two: Identifying a Coding Language Sometimes you might have to work with legacy code. Legacy code can be difficult to work with, especially if it is written in a language or style with which you are not familiar. There are more than 700 programming languages in use today, so it is impossible for any programmer to know them all. AI can be a helpful tool for identifying the language and version of legacy code, which can make your life just a little easier. Let’s look at example code that might have been written decades ago. program temperature_smog_analysis; uses Math; const n_temperatures = 1000; n_smog_measures = 500; type TemperatureArray = array[0..n_temperatures-1] of integer; SmogArray = array[0..n_smog_measures-1] of integer; var temperatures: TemperatureArray; smog_measures: SmogArray; combined_data: array of integer; slope, y_intercept: double; i, j: integer; function connect_to_database(filename: string): integer; begin { implementation of connect_to_database function } end; function retrieve_temperatures(fd: integer): integer; begin { implementation of retrieve_temperatures function } end; function retrieve_smog_measures(fd: integer): integer; begin { implementation of retrieve_smog_measures function } end; function retrieve_temperature(fd, index: integer): integer; begin { implementation of retrieve_temperature function } end; function retrieve_smog_measure(fd, index: integer): integer; begin { implementation of retrieve_smog_measure function } end; procedure close_database(fd: integer); begin { implementation of close_database function } end; We can ask Bard to take a guess at what the coding language might be. What language is this code written in? We can explore further to figure out what indicators tell Bard the language is Pascal. How do you know this is Pascal? It turns out there are multiple versions of Pascal. Since you will presumably need to work with this legacy code, you might want to know which version it is. What version of Pascal is this? 6.6 Example Three: Interpreting Regex Regex (short for Regular Expressions) is a powerful tool used to describe patterns in text. It’s a sequence of characters that define a search pattern. Regular expressions can be a powerful tool for data cleaning, text mining, and data validation. They are widely used in web development, data science, and other fields where text processing is important. They can also be tricky to understand at first because they involve a specific syntax that can be complex. We can use ChatGPT to explain regex in a bit of someone else’s code. x = sub("/d/e","/d",x) What does this do? 6.7 Example Four: Demystifying Functions Sometimes we get handed code that includes complex architecture. Perhaps you are working with code you wrote years ago when you really liked loops, but present you finds them difficult to parse. (Be nice to past you - maybe you were a baby programmer and didn’t know better.) Instead of giving yourself a headache, you could turn to ChatGPT to explain what your old code does. def my_function(x): result = x for i in range(10): for j in range(5): result = result + 2 * (i + 1) * (j + 1) * (i % 2 == 0 and j % 2 == 0) - 1 return result What does this function do? Well, now you have an idea what past-you might have been attempting to do with this code. You can also query AI about the potential problems from using the code as-written. What are some possible issues with this code? (And if you do ever run across an expression like this in your code, you can check out our chapter on refactoring code using AI to fix it!) 6.8 Limitations Although chatbots can be extremely helpful for better understanding someone else’s code, if there is a lack of annotation or documentation, it will be difficult to discern more about the context of the development. Cases where developers may not have provided context information include: Not stating their purpose for writing this code. Not describing what they planned to do next. Not describing (or being aware of) possible parts of the code that need updating or maybe have security or privacy issues. Not describing how they made certain decisions in the development process. Although we can’t truly understand some of this information, it is however possible to get some assistance from chatbots with prompts such as: What do you think the purpose of this code is? What might be possible next steps to further develop this code? What are possible issues with this code, particularly for security or privacy? Why was the code possibly written with this structure? What other options are possible? Chatbot tools are also limited in terms of how up-to-date their training data is to know about current possible issues with code. 6.9 Hands-On Exercise Now it’s your turn to try. Let’s say you were handed this bit of code to work with (and eventually modify), but first you need to figure out what’s going on with it. Unfortunately, the original programmer left very little in the way of notes or annotation. 6.9.1 The Code Note: This code is just an example and was written strictly for educational purposes. use ode_solvers::{Euler, OdeMethod}; use plotters::prelude::*; use std::fs::File; const N: f64 = 1000000.0; const I0: f64 = 10.0; const R0: f64 = 0.0; const S0: f64 = N - I0 - R0; const BETA: f64 = 0.3; const GAMMA: f64 = 0.1; fn sir_model(t: f64, y: &[f64], v: f64) -> Vec<f64> { let s = y[0]; let i = y[1]; let r = y[2]; let dsdt = -BETA * s * i / N - v * s; let didt = BETA * s * i / N - GAMMA * i; let drdt = GAMMA * i + v * s; vec![dsdt, didt, drdt] } fn main() { let mut solver = Euler::new(sir_model); solver.set_initial_condition(&[S0, I0, R0]); let root = BitMapBackend::new("sir_vaccination.png", (800, 600)).into_drawing_area(); root.fill(&WHITE).unwrap(); let mut chart = ChartBuilder::on(&root) .caption("SIR model with vaccination", ("sans-serif", 40)) .set_label_area_size(LabelAreaPosition::Left, 60) .set_label_area_size(LabelAreaPosition::Bottom, 40) .build_cartesian_2d(0.0..100.0, 0.0..N) .unwrap(); let mut data1 = vec![]; for _ in 0..1000 { let y = solver.integrate(0.1, 0.0); data1.push((solver.time(), y[0], y[1], y[2])); } let vac_rate = 0.1; let vac_coverage = 0.5; let vac_num = vac_coverage * N; let mut data2 = vec![]; for i in 0..1000 { let t = i as f64 * 0.1; let v = if t >= 50.0 && t < 150.0 {vac_num * vac_rate} else {0.0}; let y = solver.integrate(0.1, v); data2.push((solver.time(), y[0], y[1], y[2])); } chart .configure_mesh() .x_labels(10) .y_labels(10) .disable_x_mesh() .disable_y_mesh() .draw() .unwrap(); chart .draw_series(LineSeries::new( data1.iter().map(|d| 6.9.2 Questions Write a query that identifies the features that can be used to determine the coding language, as well as the version of the language. What language and version is it written in? What features give away the coding language? Create a prompt that determines what this code does, as well as important features of the program. What does this code do? Write a prompt that determines the purpose of this specific code snippet: let root = BitMapBackend::new("sir_vaccination.png", (800, 600)).into_drawing_area(); What is the purpose of this line of code? 6.10 Summary Understanding code written by someone else can be made easier by using AI AI chatbots can summarize the code and the dataset used, as well as clarify functions and expressions that are confusing AI chatbots can also identify the coding language and version used in legacy code Be mindful of who owns the code and what the summary will be used for when using AI to summarize someone else’s code "],["ai-for-bioinformatics.html", "Chapter 7 AI for Bioinformatics 7.1 Learning Objectives 7.2 AI in Bioinformatics 7.3 Using these topics for Bioinformatics 7.4 Hands-On Exercise 7.5 Summary", " Chapter 7 AI for Bioinformatics 7.1 Learning Objectives Understand the potential applications of AI assistance in bioinformatics software development Identify the challenges and considerations when using AI in bioinformatics software development Grasp the serious ethical concerns when designing bioinformatics software that deals with sensitive data 7.2 AI in Bioinformatics Bioinformatics is an interdisciplinary field that combines biology, computer science, statistics, and data analysis to extract insights from biological data. It involves the development and application of computational tools, algorithms, and databases to store, organize, analyze, and interpret vast amounts of biological information, particularly in the context of genomics and public health. Biomedical imaging and electronic health record informatics are also rapidly growing subdisciplines. Bioinformatics is especially important for research and work done with genomes. In genomics, bioinformatics enables the identification and annotation of genes, the prediction of protein structures and functions, and the analysis of genetic variations, such as single nucleotide polymorphisms and gene copy number variations. These analyses help in unraveling the genetic basis of diseases, discovering potential drug targets, and facilitating personalized medicine approaches. The proliferation of genomic data in recent years have been transformative for the field of genomics. Advances in DNA sequencing technologies, particularly next-generation sequencing, have revolutionized the ability to generate vast amounts of genomic data quickly and at a reduced cost. This has led to an unprecedented increase in the availability and accessibility of genomic information. Researchers estimate that we will generate between 2 and 40 exabytes of genomic data within the next decade. Cloud computing platforms like AnVIL are on the forefront of addressing the big genomic data storage and processing challenge. You have seen principles in previous chapters that likewise apply to creating bioinformatics software. Creating bioinformatics software, however, comes with its own unique sets of challenges: Data Complexity: Genomic data is vast and complex, often consisting of large datasets with multiple dimensions, such as nucleotide sequences and quality scores, gene expression levels, and epigenetic modifications. Developing algorithms and software tools to efficiently handle and analyze this data requires expertise in data management, processing, and interpretation as well as the biological domain knowledge. Diverse Data Types: Bioinformatics software often handles diverse and complex data outside of strings and numeric types, such as Bioconductor’s Single Cell Experiment data class. Knowledge of common data types used in bioinformatics is required to create functional software. Algorithmic Complexity: Developing accurate and efficient algorithms for various bioinformatics tasks, such as sequence alignment, gene prediction, and protein structure prediction, requires deep understanding of biological principles and computational techniques. Designing algorithms that can handle noise, errors, and biological variations is a persistent challenge. Integration with Existing Workflows: Bioinformatics software is often part of a larger analysis pipeline or workflow. Integrating AI-based tools seamlessly into existing bioinformatics pipelines and workflows, ensuring compatibility, and maintaining data integrity throughout the process requires special attention. Ethical Considerations: People are typically more sensitive about their health data when compared to their Amazon purchases. This is for a good reason: there is a painful history of the medical establishment performing experiments on people with less socio-political power. Like medical record data, genomic data is supposed to be carefully protected so that it can’t be used to disadvantage certain individuals. AI tools on the other hand, are not typically designed with security and data sovereignty in mind. 7.3 Using these topics for Bioinformatics When using AI to help generate code examples, perform refactoring, understand code, or create unit tests, you should keep the following in mind: Writing Code: AI can be used to automatically generate bioinformatics code snippets or functions. It can provide code suggestions, auto-completion, or even generate code based on given specifications or data classes. This can include creating some unit tests for your functions to ensure they are working as expected. Refactoring Code: AI can aid in refactoring bioinformatics code to improve readability, maintainability, and performance through suggestions for code restructuring. Annotating Code: Bioinformatics tools can be used for clinical decision making. It’s important that anyone vetting or auditing your tool can trace any decision making steps. Therefore, clear understanding and documentation are key. AI can assist in drafting documentation and code annotations, such as through summarizing code functionality, clarifying complex expressions or functions, and/or providing plain-terms explanations of the underlying algorithms. Understanding Code: Although bioinformatics is rapidly evolving, its often specific use cases mean that sometimes developers have to work with code that is quite old or outdated. AI can help identify the coding language and version used in legacy bioinformatics code. This can be valuable when working with older software or when integrating different codebases. NEVER submit chunks of code to AI tools without being 100% sure that they are free from personally identifiable information (PII). PII includes variables like gender, race, birth date, geographic location, and disease diagnosis. If you need to, generate dummy data to test your code. 7.4 Hands-On Exercise Now it’s your turn to try. 7.4.1 The Code You are working on improving an alignment algorithm. Alignments are typically used for comparing samples from individuals to a reference consensus sequence. To make sure things are working, you want to compare the alignment created by an existing function (pairwiseAlignment) to an alignment created by your tool (my_new_Aligner). Note: This code is just an example and was written strictly for educational purposes. if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") if (!requireNamespace("Biostrings", quietly = TRUE)) BiocManager::install("Biostrings") library(Biostrings) reference <- DNAString("ATCGTACGTA") query <- DNAString("ATCGTAGCTA") # Perform pairwise sequence alignment alignment <- pairwiseAlignment(reference, query) # Perform sequence alignment using your algorithm my_alignment <- my_new_Aligner(reference, query) # Compare pairwise alignment to my_alignment my_evaluation <- my_alignment_evaluator(reference, query, alignment, my_alignment) 7.4.2 Questions Things are going great with the toy data! You decide to make a tiny subset of your data for testing your aligner by copying the top 100 sequences from patient 1. You then submit the aligner and data to Bard. You prompt, “is there anywhere handling this data can be made more efficient?” Why is this problematic? Can AI be used to improve annotation in the code above? Why or why not? You prefer writing loops, but have learned recently that R works more efficiently using apply() functions. Is it okay to ask ChatGPT to refactor a small loop into an apply() function? Why or why not? A colleague gives you a flash drive with some code from years ago. They swear it contains some useful snippets that will improve your aligner, but it’s in a language you haven’t worked with before. You inspect the code, and don’t believe it contains protected data. The algorithm was published many years ago in a leading journal. Is it okay to ask AI what the code is doing? 7.5 Summary Like other types of software, bioinformatics software and tools can benefit from AI assistance for writing, refactoring, annotating, and understanding code. Biological data and workflows can be complex. AI does not replace a thorough understanding of biological processes at play. NEVER submit any protected or sensitive data to AI applications. This can constitute a serious, often illegal breach of subject privacy if you are dealing with data from people. "],["languages-and-cloud-computing.html", "Chapter 8 Languages and Cloud Computing Programming Languages AI and Cloud Computing", " Chapter 8 Languages and Cloud Computing Programming Languages Here are commonly used languages and their typical use cases. Python Known for its readability, simple syntax, and comprehensive add-on modules. It is often recommended as a beginner-friendly language. Python is used for scientific computing, data analysis, machine learning, and web development. Many Data Scientist roles rely on Python. Ruby Similar to Python, Ruby is also known for its readability and user-friendly syntax. It is typically used for web development, building dynamic websites, and creating web applications. JavaScript A popular language for web development, known for its versatility and ease of use. Many of the interactive features of websites rely on JavaScript. Java Java is widely used for enterprise applications, android mobile apps, and web apps. PHP PHP is easy to learn but can become complex as the application grows. It is typically used for building dynamic web pages, web applications, and content management systems. R Used for statistical computing, data analysis, and data visualization, R is a popular choice for researchers and data scientists. Some beginners find its syntax complex, especially due to differences in the two major dialects: “base” R versus the “tidyverse”. Kotlin Kotlin is a relatively new language that is designed to be more concise and expressive than Java. However, its advanced features and concepts may be challenging for beginners. It is used for developing Android mobile apps, web development, and server-side development. Swift Used for iOS and macOS app development, Swift is known for its strict syntax and steep learning curve. C# Often used for Windows desktop applications, web applications, and video games, C# can be challenging due to its more complex syntax and strict type system. C++ A powerful language often used for developing high-performance applications and operating systems, C++ can be challenging to learn due to its complexity and the need to manage memory manually. Julia Julia is designed for numerical and scientific computing, data analysis, and machine learning. It has a syntax that is easy to learn and read, making it an ideal choice for data scientists, engineers, and researchers. Julia code can also be compiled to native machine code using the Julia compiler. Because Julia is a relatively new language, it has fewer add-on libraries and resources available compared to more established languages. MATLAB MATLAB is a high-level programming language and development environment commonly used for numerical computation, data analysis, and visualization. It is widely used in engineering, physics, and finance, among other fields. It is generally considered to have a user-friendly syntax and is relatively easy to learn compared to other languages. MATLAB has a development environment called MATLAB Compiler that allows you to package MATLAB code into standalone executables or shared libraries for use on a wide variety of operating systems. Fortran Fortran has been around for decades and has a long history in scientific and engineering communities. It has a more challenging syntax and is considered harder to learn than some of the other languages on this list. It is primarily used for scientific and engineering applications that require numerical computations and simulations such as computational physics, weather forecasting, and computational chemistry. Rust Rust is a systems programming language that focuses on speed, reliability, and memory safety. It is commonly used for developing performance-critical software, such as operating systems, web browsers, and game engines. Rust’s syntax is similar to C++, but is often considered to have a steeper learning curve due to its focus on memory safety and ownership concepts. AI and Cloud Computing AI in cloud computing is an exciting and rapidly advancing field that combines artificial intelligence (AI) and cloud computing technologies. Cloud computing provides on-demand access to computing resources over the internet, such as storage, processing, and analytics, while AI uses algorithms and models to enable machines to perform tasks traditionally requiring human intelligence. Developers can easily incorporate AI into applications using cloud-based AI services. For example, Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform offer AI services like speech recognition, image recognition, natural language processing, and machine learning. AI optimizes cloud resources and performance. It allocates resources based on workload demands, optimizes power usage, detects threats, and improves application latency by predicting user behavior and preloading content. However, integrating AI and cloud computing presents challenges around data, security, privacy, ethics, complexity, and management that must be addressed carefully. Large amounts of data are needed to train AI models, especially deep learning models requiring vast labeled data to identify patterns and predictions accurately. Storing and processing huge data can be expensive, impacting cost, scalability, and environment performance. Ensuring data security, privacy, and ethical use also require consideration. AI and cloud computing are complex with specialized skills and knowledge to use effectively. Integration increases complexity, challenging management and maintenance. Careful planning and execution evaluates benefits and limits of AI; selects appropriate models and algorithms; and determines integration into the cloud environment. Monitoring evaluates model performance, enabling adjustments to optimize performance and efficiency. As both AI and cloud computing technologies advance, the intelligent integration of the two can provide on-demand access to advanced AI capabilities, optimize resources, facilitate smarter applications, and gain data-driven business insights. Integration challenges around data, security, ethics, and complexity must be managed proactively through evaluation, planning, monitoring, and adjustments to maximize benefits and minimize limitations. When implemented responsibly, AI in cloud computing enables remarkable intelligent and scalable systems transforming organizations and society. "],["about-the-authors.html", "About the Authors", " About the Authors These credits are based on our course contributors table guidelines.     Credits Names Pedagogy Lead Content Instructor Elizabeth Humphries (Writing Code, Understanding Unfamiliar Code) Content Authors Candace Savonen (Annotating Your Code), Carrie Wright (Ethics of Using AI), Ava Hoffman (Refactoring Code) Content Editor Reviewer Howard Baek Technical Template Publishing Engineers Candace Savonen, Carrie Wright, Ava Hoffman Publishing Maintenance Engineer Candace Savonen Technical Publishing Stylists Carrie Wright, Ava Hoffman, Candace Savonen Package Developers (ottrpal) Candace Savonen, John Muschelli, Carrie Wright Art and Design Figure Artists Candace Savonen, Carrie Wright, Ava Hoffman, Elizabeth Humphries Funding Funders CS, CW, and HB were supported by the National Cancer Institute (NCI) under Grant UE5CA254170. AMH and EMH were supported by the AnVIL Project through cooperative agreement awards from the National Human Genome Research Institute with cofunding from OD/ODSS to the Broad Institute (U24HG010262) and Johns Hopkins University (U24HG010263). Funding Staff Sandra Ormbrek, Shasta Nicholson   Tools Used: ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["references.html", "References", " References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] diff --git a/docs/understanding-unfamiliar-code.html b/docs/understanding-unfamiliar-code.html index 977cf992..fbcbc4dc 100644 --- a/docs/understanding-unfamiliar-code.html +++ b/docs/understanding-unfamiliar-code.html @@ -417,6 +417,10 @@

    6.3 AI in Reading Code

    What is the purpose of the summary? If you are summarizing the code for educational or research purposes, it is likely ethical to do so. However, if you are summarizing the code for commercial purposes, you may need to get permission from the owner.

    Does the code contain any sensitive or personal information? If the code contains any sensitive or personal information, you should take special care to protect that information.

    +
    +

    The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown.

    +

    Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure.

    +

    6.4 Example One: Summarizing Code You Didn’t Write

    diff --git a/docs/writing-code.html b/docs/writing-code.html index 48a22335..31ef5b16 100644 --- a/docs/writing-code.html +++ b/docs/writing-code.html @@ -734,7 +734,7 @@

    3.12 Summary

    ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source From 7ed23e5337d4dcad29bd9ab76b660475db9517ca Mon Sep 17 00:00:00 2001 From: jhudsl-robot Date: Mon, 17 Jul 2023 20:40:33 +0000 Subject: [PATCH 2/2] Render toc-less --- docs/no_toc/01-intro.md | 2 +- docs/no_toc/03-writing_code.md | 2 +- docs/no_toc/04-refactoring.md | 12 +++++++++--- docs/no_toc/05-annotation.md | 8 +++++++- docs/no_toc/06-understanding_other_ppl_code.md | 6 ++++++ docs/no_toc/About.md | 2 +- docs/no_toc/about-the-authors.html | 2 +- docs/no_toc/annotating-your-code.html | 6 +++++- docs/no_toc/introduction.html | 2 +- docs/no_toc/refactoring-code.html | 10 +++++++--- docs/no_toc/search_index.json | 2 +- docs/no_toc/understanding-unfamiliar-code.html | 4 ++++ docs/no_toc/writing-code.html | 2 +- 13 files changed, 45 insertions(+), 15 deletions(-) diff --git a/docs/no_toc/01-intro.md b/docs/no_toc/01-intro.md index 426a9331..3dfa00fb 100644 --- a/docs/no_toc/01-intro.md +++ b/docs/no_toc/01-intro.md @@ -71,7 +71,7 @@ The methods used for chatbots have evolved over time. Now chatbots often utilize Despite the fact that chatbots have been around awhile, the popularity of OpenAI's ChatGPT and DALL-E programs has sparked a recent surge of interest. These chatbots are in part particularly powerful due to the fact that large amounts of computing power were used to train their NLP models on very large datasets [@caldarini2022literature; @cahn2017chatbot]. Large language model AIs can be divided into two categories: those that can be reached using an internet browser, and those that can be reached using an integrated development environment (IDE). ::: warning -The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved and how sensitive or personally identifiable information are protected. +The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. Err on the side of caution when interacting with them. We do **not** recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. ::: diff --git a/docs/no_toc/03-writing_code.md b/docs/no_toc/03-writing_code.md index f64a7b9c..d171f1e4 100644 --- a/docs/no_toc/03-writing_code.md +++ b/docs/no_toc/03-writing_code.md @@ -538,7 +538,7 @@ devtools::session_info() ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source diff --git a/docs/no_toc/04-refactoring.md b/docs/no_toc/04-refactoring.md index 0e0d2c25..853596b9 100644 --- a/docs/no_toc/04-refactoring.md +++ b/docs/no_toc/04-refactoring.md @@ -30,6 +30,12 @@ However, AI has significant potential to help with code refactoring. AI can use AI refactoring is also faster and more accurate than manual refactoring. This is particularly useful for large-scale software projects with massive codebases, where manual code review and refactoring can be an enormous task. In the next sections, we'll take a look at some examples of using AI to refactor code. +::: warning +The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. + +Err on the side of caution when interacting with them. We do **not** recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. +::: + ## Example One: Correcting Syntax Programming languages operate using their established set of rules. For example, [Python](https://www.python.org/) uses indentation to define code blocks. However, when you shift between development environments, indentation and line breaks can sometimes be affected, causing major headaches! @@ -547,7 +553,7 @@ proc.time() - start_time ``` ## user system elapsed -## 280.405 0.068 280.464 +## 210.361 0.068 210.452 ``` :::{.query} @@ -575,7 +581,7 @@ proc.time() - start_time ``` ## user system elapsed -## 0.733 0.308 0.942 +## 0.771 0.244 0.746 ``` The `outer()` function performs the same calculation as the nested loop in the original code, but more efficiently. It returns a matrix of all possible combinations of x and y values, with each element of the matrix being the product of the corresponding x and y values. The `rowSums()` function is then used to sum the elements of each row of the matrix, which is equivalent to summing the products of x and y for each index `i` in the original loop. This method avoids the need for the nested loop, resulting in a faster and more efficient computation. @@ -603,7 +609,7 @@ proc.time() - start_time ``` ## user system elapsed -## 0.424 0.340 0.496 +## 0.334 0.248 0.357 ``` One optimized way to perform the same calculation is by using the `%*%` operator to perform matrix multiplication. This can be done by converting x and y to matrices and transposing one of them so that their dimensions align for matrix multiplication. This code should be much faster than the original implementation because it takes advantage of highly optimized matrix multiplication algorithms in R. diff --git a/docs/no_toc/05-annotation.md b/docs/no_toc/05-annotation.md index f63cd8d7..337a5535 100644 --- a/docs/no_toc/05-annotation.md +++ b/docs/no_toc/05-annotation.md @@ -19,6 +19,12 @@ There are many benefits to annotating code: - Improves maintainability - Improves quality +::: warning +The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. + +Err on the side of caution when interacting with them. We do **not** recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. +::: + ## Improves readability Dino says It’s great this code is well annotated using AI. I totally forgot what I was doing with this code when I wrote it. @@ -368,7 +374,7 @@ devtools::session_info() ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source diff --git a/docs/no_toc/06-understanding_other_ppl_code.md b/docs/no_toc/06-understanding_other_ppl_code.md index f47dab84..50c2b820 100644 --- a/docs/no_toc/06-understanding_other_ppl_code.md +++ b/docs/no_toc/06-understanding_other_ppl_code.md @@ -41,6 +41,12 @@ Keep these ethical considerations in mind when using AI to summarize code. **Does the code contain any sensitive or personal information?** If the code contains any sensitive or personal information, you should take special care to protect that information. ::: +::: warning +The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. + +Err on the side of caution when interacting with them. We do **not** recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. +::: + ## Example One: Summarizing Code You Didn't Write Here's some simple code using the R programming language, written by an unnamed programmer. If you are familiar with R, you might be able to understand what this code does with little effort. However, if you don't know R, you're relatively new to R, or you're just having a bad day, understanding what this bit of code can take a great deal more energy and time. AI language bots can ease that burden. diff --git a/docs/no_toc/About.md b/docs/no_toc/About.md index 9017e2ff..681e0630 100644 --- a/docs/no_toc/About.md +++ b/docs/no_toc/About.md @@ -38,7 +38,7 @@ Tools Used: ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source diff --git a/docs/no_toc/about-the-authors.html b/docs/no_toc/about-the-authors.html index 341100b7..9cf662c7 100644 --- a/docs/no_toc/about-the-authors.html +++ b/docs/no_toc/about-the-authors.html @@ -471,7 +471,7 @@

    About the Authors

    ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source diff --git a/docs/no_toc/annotating-your-code.html b/docs/no_toc/annotating-your-code.html index 667908a8..72976d41 100644 --- a/docs/no_toc/annotating-your-code.html +++ b/docs/no_toc/annotating-your-code.html @@ -406,6 +406,10 @@

    5.2 Annotating Code is Good Pract
  • Improves maintainability
  • Improves quality
  • +
    +

    The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown.

    +

    Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure.

    +

    5.3 Improves readability

    @@ -644,7 +648,7 @@

    5.10 Summary

    ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source diff --git a/docs/no_toc/introduction.html b/docs/no_toc/introduction.html index 236e8db2..6fc32579 100644 --- a/docs/no_toc/introduction.html +++ b/docs/no_toc/introduction.html @@ -433,7 +433,7 @@

    1.4 The AI Chatbots

    The methods used for chatbots have evolved over time. Now chatbots often utilize AI methods like deep learning (which involve multiple layers of abstractions of the input data (“Deep Learning” 2023)) to extract meaning from the language data (“Natural Language Processing” 2023). As these methods use large quantities of text, they are therefore often called large language models (“Large Language Model” 2023).

    Despite the fact that chatbots have been around awhile, the popularity of OpenAI’s ChatGPT and DALL-E programs has sparked a recent surge of interest. These chatbots are in part particularly powerful due to the fact that large amounts of computing power were used to train their NLP models on very large datasets (Caldarini, Jaf, and McGarry 2022; Cahn 2017). Large language model AIs can be divided into two categories: those that can be reached using an internet browser, and those that can be reached using an integrated development environment (IDE).

    -

    The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved and how sensitive or personally identifiable information are protected.

    +

    The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown.

    Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure.

    The dinosaur cartoon says in a speech bubble 'Wow, there are already so many options!' as names of AI tools swirl around the dinosaur's head.

    diff --git a/docs/no_toc/refactoring-code.html b/docs/no_toc/refactoring-code.html index c9804a8f..175e0a3a 100644 --- a/docs/no_toc/refactoring-code.html +++ b/docs/no_toc/refactoring-code.html @@ -407,6 +407,10 @@

    4.3 AI in Code Refactoring

    Code refactoring has historically been done manually by developers. This involves reviewing code and identifying areas that could be improved or optimized, and then making changes to the codebase accordingly. Though important, this is process is time-consuming and labor-intensive, as it requires developers to carefully review every line of code to identify potential issues or areas for improvement. Additionally, manual code refactoring is error-prone, as developers can accidentally introduce bugs or errors into the codebase while making changes.

    However, AI has significant potential to help with code refactoring. AI can use machine learning algorithms to analyze large amounts of code and identify patterns or areas that could be improved. For example, they can identify sections of code that are redundant, overly complex, or difficult to maintain, and suggest changes that could be made to improve the codebase. Machine learning algorithms can also help to identify potential bugs or security issues in the codebase, which can help to improve the overall quality and stability of the software.

    AI refactoring is also faster and more accurate than manual refactoring. This is particularly useful for large-scale software projects with massive codebases, where manual code review and refactoring can be an enormous task. In the next sections, we’ll take a look at some examples of using AI to refactor code.

    +
    +

    The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown.

    +

    Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure.

    +

    4.4 Example One: Correcting Syntax

    @@ -762,7 +766,7 @@

    4.10 Code optimization

    # End timer proc.time() - start_time
    ##    user  system elapsed 
    -## 280.405   0.068 280.464
    +## 210.361 0.068 210.452

    What is a faster, more optimized way of running the following R code?

    @@ -781,7 +785,7 @@

    4.10 Code optimization

    # End timer proc.time() - start_time
    ##    user  system elapsed 
    -##   0.733   0.308   0.942
    +## 0.771 0.244 0.746

    The outer() function performs the same calculation as the nested loop in the original code, but more efficiently. It returns a matrix of all possible combinations of x and y values, with each element of the matrix being the product of the corresponding x and y values. The rowSums() function is then used to sum the elements of each row of the matrix, which is equivalent to summing the products of x and y for each index i in the original loop. This method avoids the need for the nested loop, resulting in a faster and more efficient computation.

    So much faster! We can look at another option by regenerating the response:

    @@ -800,7 +804,7 @@

    4.10 Code optimization

    # End timer proc.time() - start_time
    ##    user  system elapsed 
    -##   0.424   0.340   0.496
    +## 0.334 0.248 0.357

    One optimized way to perform the same calculation is by using the %*% operator to perform matrix multiplication. This can be done by converting x and y to matrices and transposing one of them so that their dimensions align for matrix multiplication. This code should be much faster than the original implementation because it takes advantage of highly optimized matrix multiplication algorithms in R.

    While this second suggestion is faster, you will need to consider what aspects of the codebase are most important in each instance. For example, this code runs more quickly, but the %*% operator might be unfamiliar to some R programmers. In cases where efficiency is less important, or the data are not large, you might consider maximizing readability.

    diff --git a/docs/no_toc/search_index.json b/docs/no_toc/search_index.json index 07f78cf8..30ad7ea7 100644 --- a/docs/no_toc/search_index.json +++ b/docs/no_toc/search_index.json @@ -1 +1 @@ -[["index.html", "AI for Software Development Harnessing the Power of Large Language Models About this Course Available course formats Suggested Activity Context Learning Objectives Lesson Breakdown", " AI for Software Development Harnessing the Power of Large Language Models July, 2023 About this Course This course on AI for software development explores the use of AI large language models such as ChatGPT, Bard, and others and their potential benefits and challenges. Through examples and hands-on activities, students will develop an understanding of the ways in which AI can speed up software development tasks and free up time for more creative and strategic work. By the end of the course, students will be equipped to navigate the rapidly changing landscape of software development and use AI chatbots in a way that maximizes benefits and efficiency while limiting harm as much as possible. Available course formats This course is available in multiple formats which allows you to take it in the way that best suits your needs. The material for this course can be viewed without login requirement on this Bookdown website. This format might be most appropriate for you if you rely on screen-reader technology. Our courses are open source, you can find the source material for this course on GitHub. The slides for the images in the course can be found here. Suggested Activity Context Course Audience Professionals looking to improve efficiency Students hoping to learn more about programming Anyone curious about how AI can be harnessed for technology Note: Those completely new to programming will find using AI tools to create software solutions challenging at this time. It is helpful to have some fundamental knowledge to write appropriate prompts and to discern when code is not working as expected. We advise novices to seek expert review. Course Prerequisites Some previous exposure to programming recommended. Assessment Type Hands-on activities following each lesson. Learning Objectives Learners will be able to: Explain the basics of AI and its potential for improving software development workflows Implement strategies to write and debug code using AI-based tools Describe the benefits of refactoring code using AI-powered techniques, such as making code readable, keeping it brief, and optimizing code Apply best practices for annotating code using AI Recognize strategies for using AI-based tools to understand and analyze code, such as code comprehension of unfamiliar languages or functions Discuss the challenges and ethical implications of using AI for different aspects of software development Lesson Breakdown Activity Duration Introduction 10 minutes Ethics for Using AI 20 minutes Writing Code 35 minutes Writing Exercise 15 minutes Refactoring Code 35 minutes Refactoring Exercise 15 minutes Annotating Code 25 minutes Understanding Code 25 minutes Understanding Exercise 10 minutes Total 3 hours & 10 minutes "],["introduction.html", "Chapter 1 Introduction 1.1 Learning Objectives 1.2 AI Provides New Opportunities 1.3 AI Code of Ethics 1.4 The AI Chatbots 1.5 Reproducibility With AI Chatbots 1.6 How to Use This Book 1.7 About This Book", " Chapter 1 Introduction The increasing availability of AI chatbots is creating a new landscape in software development and data science. It is important to be aware of the potential benefits and challenges of this technology, so that we can use it in a way that benefits everyone and limits harm as much as possible. On the one hand, AI chatbots have the potential to replace or significantly reduce the workload of programmers. This could lead to job losses in the data science industry. Additionally, there are significant ethical and privacy concerns regarding the use of training data sets to teach these AI systems. For example, if a chatbot is trained on a dataset of biased or discriminatory data, it could perpetuate those biases in its interactions with users. On the other hand, AI chatbots also have the potential to create new jobs and opportunities. For example, chatbot developers will be needed to create and maintain these AI systems. Additionally, AI chatbots could be used to automate tasks that are currently done by humans, freeing up programmers to focus on more creative and strategic work of interpreting what the data actually mean. Ultimately, while the future of programming in software development and data science is unclear, there is much to be optimistic about in this brave new world. The advancements in AI and chatbot technology present an opportunity for programmers to work alongside algorithms to create innovative solutions that were previously unimaginable. As with any technological revolution, there will be challenges, but it is up to us to work together to ensure that we use these tools ethically and responsibly, and to create a future that benefits everyone. 1.1 Learning Objectives In this chapter we will demonstrate how to: Consider the potential benefits and limitations of AI for software development Recognize how to use this course, how it was written, and what to expect Explain general concepts for ethical use of AI tools for software development Summarize what Chatbots are and how they work Distinguish between different AI options for software development 1.2 AI Provides New Opportunities We do not believe that AI chatbots will replace programmers, software developers, and data scientists. While AI can be powerful and useful, it is only as good as the training data behind it and the human using it. Therefore, for the time being and the foreseeable future, humans will remain at the forefront of programming innovation, with AI providing valuable support. As new code is often built on top of existing code written by humans, it is unlikely that AI will fully replace the need for human programmers. However, we do think that the roles of programmers in software development and data science are changing, and some of the skills required to work in this new environment are slightly different from those needed before. In addition to proficiency in programming syntax, programmers may need to develop a more refined ability to construct queries that enable AI bots to create the desired workflows. Other critical skills such as debugging, troubleshooting, and documentation will continue to be valuable, and can be enhanced through the use of AI. Furthermore, AI creates new opportunities and niches for tasks that were previously difficult, time-consuming, or impossible. This is an exciting time for those working in programming, with great potential to explore new and innovative solutions. As AI technology continues to advance, its role in programming will only become more significant, and it is up to developers to adapt and learn how to use these new tools effectively. We asked OpenAI’s ChatGPT whether AI chatbots would replace human software developers, data scientists, and programmers, and this was the response: “It is unlikely that AI chatbots will replace software developers, data scientists, and programmers entirely. While AI chatbots can perform certain tasks related to software development, data analysis, and programming, they are limited in their abilities and require human oversight and input to function properly. AI chatbots can be used to automate certain tasks such as generating simple code snippets, running tests, and performing simple data analysis. They can also assist in the development process by providing quick answers to commonly asked questions and suggesting possible solutions to programming problems. However, more complex software development, data analysis, and programming tasks require a level of creativity, critical thinking, and problem-solving that AI chatbots cannot currently replicate. Software developers, data scientists, and programmers have a deep understanding of the technologies and tools used to develop AI chatbots, as well as the ability to optimize and improve those tools. Furthermore, software development, data analysis, and programming require a high level of human communication and collaboration, which AI chatbots cannot currently replicate. Humans are also essential in making ethical decisions related to AI development, such as ensuring that AI systems are transparent, fair, and unbiased. In summary, while AI chatbots can provide valuable assistance to software developers, data scientists, and programmers, they are unlikely to replace them entirely. Human expertise, creativity, and critical thinking will continue to be essential in developing and optimizing AI systems for the foreseeable future.” 1.3 AI Code of Ethics Our first duty as programmers is to avoid causing harm to people and the world. This includes avoiding both intentional and inadvertent harm. Intentional harm is harm that is caused by a programmer’s deliberate actions. For example, a programmer might create a virus that is designed to harm computers. Inadvertent harm is harm that is caused by a programmer’s negligence or lack of foresight. For example, a programmer might create a program that is buggy and causes data loss, or create a program that is particularly vulnerable to being hacked. We must always be aware of the potential for harm and deliberately take steps to mitigate it. This includes things like: Be aware of the potential for bias. AI systems are trained on data, and that data can reflect the biases of the people who collected it. Programmers should be aware of the potential for bias in their data and take steps to mitigate it. Be transparent about your work. Share information about your code, data, and algorithms. This will help others to understand how your work works and to identify any potential problems. Be accountable for your work. Take responsibility for the consequences of your code. Be willing to make changes if necessary. Participate in discussions about AI ethics. There are many online and in-person communities where you can discuss the ethical implications of AI with other people. Advocate for ethical AI. Speak up when you see AI being used in a way that you believe is unethical. Support organizations that are working to promote ethical AI. 1.4 The AI Chatbots Humans have been interacting with AI chatbots for years. In fact, Alan Turing is credited with coming up with the concept for chatbots as early as 1950. Chatbots are software-based systems that interact with humans typically by text or speech inputs, rather than code. They mimic some human activity (“Chatbot” 2023; Abdulla et al. 2022) based on these language inputs. They process the inputs using natural language processing commonly abbreviated as NLP. NLP is a kind of AI that uses human text or speech and parses the language to determine structures and patterns to extract meaning. NLP uses large amounts of language data (such as books, websites etc.) to train AI systems to identify these structures and patterns. For example, the AI model might identify when a sentence is a question or a statement by examining various features in a prompt such as the inclusion of a question mark of the use of words often used in questions (“Natural Language Processing” 2023; Cahn 2017). The methods used for chatbots have evolved over time. Now chatbots often utilize AI methods like deep learning (which involve multiple layers of abstractions of the input data (“Deep Learning” 2023)) to extract meaning from the language data (“Natural Language Processing” 2023). As these methods use large quantities of text, they are therefore often called large language models (“Large Language Model” 2023). Despite the fact that chatbots have been around awhile, the popularity of OpenAI’s ChatGPT and DALL-E programs has sparked a recent surge of interest. These chatbots are in part particularly powerful due to the fact that large amounts of computing power were used to train their NLP models on very large datasets (Caldarini, Jaf, and McGarry 2022; Cahn 2017). Large language model AIs can be divided into two categories: those that can be reached using an internet browser, and those that can be reached using an integrated development environment (IDE). The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved and how sensitive or personally identifiable information are protected. Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. This is not a comprehensive list! Browser-based chatbots Bard is Google’s large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive. It is trained on a massive amount of text data and is able to communicate and generate human-like text in response to a wide range of prompts and questions. You will need to join the waiting list before getting access, although in our experience the wait time is short. ChatGPT is OpenAI’s large language model based on the GPT (Generative Pre-trained Transformer) architecture. It is designed to understand and generate human-like responses to text-based conversations. You will need to create an account before using it. Phind is a AI-powered search engine designed specifically for developers asking technical questions. In addition to answering your coding question, it will provide links to the internet sources used to find the answer. Poe is Quora’s platform that gives you access to multiple AI chatbots, including Sage, based on OpenAI’s GPT-3.5 architecture; Claude, an AI natural language bot created by Anthropic; and ChatGPT. IDE-based chatbots CoPilot by GitHub is a language bot that interfaces with your editor. This bot provides coding suggestions and ways to speed up testing. GitHub describes it as an “AI pair programmer”. It uses a subscription model with both individual and business tiers. CodeWhisperer is the Amazon Web Services chatbot for the AWS cloud platform. It can be installed in your integrated development environment (IDE) using the latest AWS Toolkit plugin. The individual tier for CodeWhisperer is free, although you will need to create an AWS account and add a payment method. What are the data sources for these chatbots? Some high-profile English-language AIs, including Google’s T5 and Facebook’s LLaMA, were trained on more than 15 million websites (Schaul, Chen, and Tiku 2023). The top sources of training data for these AI models were https://patents.google.com/ and https://www.wikipedia.org/ ! 1.5 Reproducibility With AI Chatbots Reproducibility in programming is the ability to reproduce the results of an analysis using the same data and methods. It’s an important aspect of working with AI tools as well, particularly when using AI to debug or fix code. In order to maximize the reproducibility of your work, you should create good habits when working with any AI tool. This includes documenting your prompts and the intentions behind them, as well as saving the response threads generated by an AI chatbot. Some of the current chatbots already save your threads (particularly ChatGPT and Phind). At this time, Bard and the chatbots accessible via Poe do not save queries as individual threads but do keep a record of the questions and responses for some time. However, it is best practice to save them yourself as the saved chat threads can be reset. Saving them yourself also allows you to write notes about your intentions. Keep in mind that the answers from the chatbots can change over time - don’t expect them to always be the same! Phrasing a query slightly differently can result in a different answer, as can asking in a new session. You should expect to ask the AI tool the same question in multiple ways in order to be confident in the answer. 1.6 How to Use This Book This book is written with the expectation that users will follow along and paste the queries from each example into their own AI chatbot session. After you have finished this book, you will be able to use whichever combination of chatbot and programming environment that you’re comfortable using. 1.6.1 Choice of AI Model You can use whichever AI you like for this course, as long as it is a large language model. (Other types of AIs are not explored in this course.) We will use a variety of browser-based chatbots as examples throughout this course. Some AIs will be better suited for your particular tasks, and it is too early to judge which of the technologies will end up dominating the industry. We encourage you to test multiple AI platforms to find the ones that best fit your needs, and the use of a particular AI in each chapter should not be taken as an endorsement by us. We have focused on the browser-based chatbots for our examples in this course for ease of demonstration. 1.6.2 Computing Environment Needs All the examples and activities in this course can be done in the computer environment you prefer. None of the tasks require a huge amount of computing power, so local installations on your personal computer are fine, though everything will also translate to a high performance computing cluster or a cloud computing platform. In addition to the CodeWhisperer chatbot available on the AWS cloud platform, a variety of cloud-based AI tools and platforms exist that can be leveraged for practical projects after this course. For instance, Amazon SageMaker or Google Cloud AI Platform can be used for constructing and training machine learning models, while Microsoft Azure Cognitive Services can be applied to incorporate natural language processing features into your applications. If you’d like to learn more about how AI can be integrated with cloud computing, you can read about it in the Additional Learning section. 1.6.3 Boxes Throughout this course, you will see several types of specially-colored boxes. In particular, there are two to which you should pay special attention. One box highlights potential ethical issues with an AI usage. In this box, you’ll find reminders of the ethical considerations you should be particularly aware of while using AI to do certain tasks. The other box highlights our queries to AI chatbots. The question we posed to the AI chatbot is written in these boxes to make it easier for you to find. Wherever possible, we use screenshots of the actual answers we got from the AI chatbots in response to our questions. Occasionally when we have asked for the AI to write or fix code, we have formatted the answer in such a way that makes it easier to use. When you see this format, you will know it indicates an AI response Code in these boxes was written by AI. 1.7 About This Book We used AI bots (including ChatGPT, Bard, Sage, and Claude) to develop and write much of this course. We firmly believe this new technology can be used in ethical ways and we put into practice all the rules and suggestions offered over the following modules while we created this book. "],["ethics-of-using-ai.html", "Chapter 2 Ethics of Using AI 2.1 Learning Objectives 2.2 Major concerns 2.3 Replacing Humans 2.4 Bias 2.5 Misinformation 2.6 Unintelligible Code 2.7 Faulty or Buggy Code 2.8 Security and Privacy issues 2.9 Violating Copyright 2.10 Harmful code 2.11 Summary", " Chapter 2 Ethics of Using AI The use of artificial intelligence (AI) and in particular, generative AI, in coding has raised a number of ethical concerns. We will highlight several current concerns, however please be aware that this is a dynamic field and the possible implications of this technology is continuing to develop. It is critical that we as a society continue to evaluate and predict what the consequences of the use of AI will be, so that we can mitigate harmful effects. 2.1 Learning Objectives In this chapter we will demonstrate how to: Describe key ethical concerns for using AI tools in software development Identify possible mitigation strategies for these major concerns Explain the potential benefits of being transparent about the use of AI tools to write code Discuss why human contributions are still important and necessary Recognize strategies for using AI tools to write code more responsibly Implement prompts to ask AI tools about responsible use throughout various coding practices 2.2 Major concerns In this chapter we will discuss the following issues that writing code with AI tools may contribute to: Replacing Humans - AI tools can help humans write code, but they are not a replacement. Bias - AI models are built on data and code that were created by biased humans, this bias can be further perpetuated. Misinformation - fake or manipulated data used to help design algorithms could be believed to be correct and this could be further propagated. Unintelligible Code - Developers need to follow best practices with code generated by AI. Just like code generated by human developers, this includes figuring out how the code works, how it integrates with other code, as well as annotating and documenting how to use the code . Faulty or Buggy Code - Code that may not be optimal for a given situation may be inadvertently used by those with less familiarity, which may result in faulty software. Security or Privacy Issues - Code may not be optimized for security if not adequately evaluated. Generative AI models have access to large amounts of data that is currently unregulated and may include data that should be protected for privacy reasons. Copyright Violations - The code used for the generative AI model may used code that has copyright laws that require attribution or do not allow reuse and it may not be clear where the code came from. Harmful Code - Currently it is not clear how well generative AI models restrict the creation of code that will be used for goals that harm others. Note that this is an incomplete list; additional ethical concerns will become apparent as we continue to use these new technologies. We highly suggest that users of these tools be transparent about the use of these tools, so that as new ethical issues emerge, we will be better prepared to understand the implications for specific coding projects. Be transparent about what AI tools you use to write code. This help others to better understand how you created your code, as well as the possible sources that the AI tools might have used when helping you write code. It may also help with future unknown issues related to the use of these tools. Some organizations and scientific societies have created guidelines or requirements for using AI in journal articles and conference submissions, like the International Society for Computational Biology. Be aware of the requirements/guidelines for your field. It is essential to address these ethical concerns and ensure that the use of AI in coding is done in a responsible and transparent manner. This could be done through ensuring the quality of the data used to train AI systems, promoting transparency in AI-generated code, and implementing safeguards against the creation of harmful or biased code. By doing so, we can harness the potential of AI to improve and transform the way we write and optimize code while maintaining ethical standards. 2.3 Replacing Humans Those who use AI tools to write code or to understand code need to recognize their own value in the process. While AI systems are useful, they do not replace the strengths that humans have for innovating new ways to write code, evaluating how the code integrates into the larger picture of a project, or in evaluating the downstream consequences of the code. Computer science is a field that has historically lacked diversity and new learners can often feel intimidated. There is the potential that new learners may feel even more discouraged as they learn to write code when witnessing AI tools write code. It is critical that we support diverse new learners of computer science, as we will continue to need human involvement in the development and use of AI tools. 2.3.1 Tips for supporting human code Follow best teaching practices for inclusion of diverse learners. Encourage current and future programmers to continue to innovate and write code. Avoid thinking that code written by computers must be better than code written by humans, as this simply is not true. Recall that humans first wrote they code that the AI tools use to write code. Be transparent about when you do or do not use AI tools, give credit to the humans who write code where possible. A new term in the medical field called AI paternalism describes the concept that doctors (and others) may trust AI over their own judgment or the experiences of the patients they treat. This has already been shown to be a problem with earlier AI systems intended to help distinguish patient groups. Not all humans will necessarily fit the expectations of the AI model if it is not very good at predicting edge cases (Hamzelou n.d.). Therefore, in all fields it is important for us to not forget our value as humans in our understanding of the world. 2.4 Bias One of the biggest concerns is the potential for AI to create biased code. AI systems are trained on data created by humans. If this data used to train the system is biased (and this includes existing code that may be written in a biased manner), the resulting code could also be biased. This could lead to discrimination, abuse, or neglect for certain groups of people, such as those with certain ethnic or cultural backgrounds, genders, ages, sexuality, capabilities, religions or other group affiliations. It is well known that data and code are often biased (Belenguer 2022). It is important that evaluations be made throughout the software development process to check for and consider potential perpetuation of bias. 2.4.1 Tips for avoiding bias Be aware of the potential biases in the data that is used to train AI systems. Check for possible biases within the data. Are there harmful data values? Examples could include discriminatory and false associations. Are the data adequately inclusive? Examples could include a lack of data about certain ethnic or gender groups or disabled individuals, which could result in code that does not adequately consider these groups, ignores them all together, or makes false associations. Are the data of high enough quality? Examples could include data that is false about certain individuals. Evaluate the code for biases as it is developed. Check if any of the criteria for weighting certain data values over others are rooted in bias. Consider the possible outcomes of the use of the code. Consider if the code could possibly be used in a manner that will result in discrimination. See Belenguer (2022) for more guidance. We also encourage you to check out the following video for a classic example of bias in AI: 2.5 Misinformation AI tools use data that may contain false information. AI tools may also report data as if it is real, when it is in fact not real. For example, currently at the time of the writing of this course, ChatGPT will report citations with links that are not correct. 2.5.1 Tips for reducing misinformation Be aware that some AI tools currently make up false information based on artifacts of the algorithm. Do not assume that everything that the algorithm produces is real or correct. 2.6 Unintelligible Code There is risk that those less aware of best coding practices use AI-generated code and do not follow these practices. This could make it difficult for others to understand how the code works and could make it hard to identify and fix any issues that may arise. This could result in negative consequences, such as system crashes or security breaches, that could have been avoided if the code had been written by an experienced and savvy human programmer. 2.6.1 Tips for avoiding unintelligible code Code should be reviewed by experienced programmers. Code should be annotated throughout to explain what the code is doing. Documentation should be created that describes how to use the code properly. Ask AI tools to help you understand the code, but get expert assistance where needed. Can you explain what the code in line 4 is doing? 2.7 Faulty or Buggy Code There is a risk that code written with AI tools, if not reviewed carefully enough, may be faulty. AI systems may not be trained on data that relates to current software dependencies or best practices. It is also important to remember that code generated by AI tools is not necessarily better than code written by humans. It requires just as much code review. 2.7.1 Tips for reducing faulty or buggy code Do not assume that the code generate by AI is correct. Realize that AI is only as good or up-to-date as what it was trained on, the code may be generated using out-of-date code. Look up packages and functions used to ensure if the code is up-to-date. Make sure that you understand the code that you are using. AI can be used to help you understand what the code is doing, but consult with experts when needed. Ask the AI tools for extra information about if there are any potential limitations or weaknesses in the code, but keep in mind that the tool may not be aware of issues and therefore human review is required. The information provided by the tool can however be a helpful starting point. Are there any limitations associated with this code? What assumptions were made in writing this code? Stack Overflow, a popular community-based website where programmers help one another, has (at the time of writing this) temporarily banned users from answering questions with AI-generated code. This is because users were posting incorrect answers to questions. It is important to follow this policy (as you may face removal from the community). This policy goes to show that you really need to check the code that you get from AI models. While they are currently helpful tools, they do not know everything. 2.8 Security and Privacy issues AI tools are not always trained in a way that is particularly conscious of data security. Furthermore, if the code is written using these tools by users who are less familiar with coding security concerns, protected data or important passwords may be leaked within the code itself. AI systems may also utilize data that was actually intended to be private. It is also important to consider what data your code might actually be using. 2.8.1 Tips for reducing security and privacy issues Check that all passwords, access tokens (like API keys), security certificates are not saved in a public place where anyone can access or tamper with them. Check that no sensitive data, such as Personal Identifiable Information (PII) becomes public through the code. Utilize encryption and other security practices where needed. Consult with an expert about data security if you think your code could possibly cause someone to access protected data who is not authorized to access it. Promote for regulation of AI tools by voting for such standards where possible. Ask AI tools for help, but to not rely on them alone. Are there any security or privacy concerns associated with this code? 2.9 Violating Copyright When AI systems are trained on data, they may also learn and incorporate code from that data. This means that AI-generated code could potentially infringe on the copyright of the original author of the code. For example, if an AI system is trained on a GitHub repository that contains code written by a human programmer, the AI system could generate code that is identical to or similar to the code in the GitHub repository. If the AI system then uses this code without permission from the original author, this could constitute copyright infringement. In general, we want programmers to feel comfortable sharing their code openly without fear they won’t be credited. Similarly, AI systems could potentially infringe on intellectual property rights by using code that is protected by trademarks or patents. For example, if an AI system is trained on a training manual that contains code that is protected by a trademark, the AI system could generate code that is identical to or similar to the code in the training manual. If the AI system then uses this code without permission from the trademark owner, this could constitute trademark infringement. 2.9.1 Tips for avoiding copyright violations Be transparent about what AI tools you use to write your code. Obtain permission from the copyright holders of any code that you use to train the AI system. Only use code that is in the public domain or that has been licensed for use by the AI system’s owner. Cite any GitHub repositories or training manuals you might use in your code. Ask the AI tools if the code it helped generate used any code that you can cite. Did this code use any code from others that I can cite? 2.10 Harmful code Another major concern is the use of AI to generate malicious code. For instance, AI could be used to create code that spreads malware or hacks into computer systems. This could cause severe damage to individuals and organizations, including data breaches and financial losses. 2.10.1 Tips for avoiding the creation of harmful code Be careful about what code you share publicly, as it could be used for malicious purposes. Consider how the code might be used by others. Ask the AI tools to help you, but do not rely on them alone. What are the possible downstream uses of this code? What are some possible negative consequences of using this code? 2.11 Summary Here is a summary of all the tips we suggested: Disclose when you use AI tools to write code. Credit human authors by citing their code and adhering to copyright restrictions. Ensure the code doesn’t expose sensitive data, such as passwords or PII. Don’t assume AI-generated content is real, accurate, or current. Ask the AI tools to help you understand: Sources for the code that you can cite What the code is doing Code limitations Potential security or privacy issues Potential downstream consequences of the code Always have expert humans review the code and value your own contributions and thoughts. Overall, we hope that these guidelines and tips will help us all to use AI tools more responsibly for writing code. We recognize however, that as this is emerging technology and more ethical issues will emerge as we continue to use these tools in new ways. AI tools can even help us to use them more responsibly when we ask the right additional questions when writing code, but remember that human review is always necessary. Staying up-to-date on the current ethical considerations will also help us all continue to use AI responsibly. "],["writing-code.html", "Chapter 3 Writing Code 3.1 Learning Objectives 3.2 Coding Basics 3.3 Writing Code with AI 3.4 Tips for Coding with AI 3.5 Example One: Creating a Function 3.6 Example Two: Debugging Code 3.7 Example Three: Comparing Syntax 3.8 Example Four: Figuring Out a Common Task 3.9 Example Five: Planning Your Program 3.10 Example Six: Selecting a Coding Language 3.11 Hands On Exercise 3.12 Summary", " Chapter 3 Writing Code 3.1 Learning Objectives Recognize the benefits and capabilities of AI-powered programming tools, particularly for certain kinds of tasks that are difficult for humans Explain how AI-generated code and automated programming tools contribute to making programming more accessible to individuals with limited coding experience or to those working in a new language Explore the ethical considerations associated with using AI in programming, including the importance of rigorous testing before publishing code Apply strategies for using AI to build functions, debug code, understand error messages, check syntax, choose the proper programming language, and plan out a multi-step programming project 3.2 Coding Basics In today’s digitally-driven world, coding has become an essential skill. It is used to create websites, mobile applications, and sophisticated software systems. Contrary to popular belief, coding is not only for those who possess an innate talent for programming. With the right resources and a willingness to learn, anyone can become proficient in coding. Practice is crucial to mastering the art of coding, and those who are willing to commit themselves to the learning process can achieve great results. Writing code can also be a rewarding and fulfilling hobby, too. Creating your own software, building a website, or automating a task can be immensely satisfying, and can provide a sense of accomplishment and pride. Moreover, writing code can be a form of creative expression. Just as musicians use notes to compose music and artists use paint to create visual art, programmers use code to create digital experiences. From building websites to creating video games, code can be used to bring ideas to life in exciting and innovative ways. Ultimately, the ability to write code is a valuable asset in today’s world, with increasing demand for skilled programmers in a wide range of industries. Whether as a career or hobby, coding can be both personally and professionally rewarding. The accessibility and versatility of coding make it an excellent skill for anyone to learn, regardless of their background or previous experience. 3.3 Writing Code with AI AI-powered programming tools and technologies are revolutionizing the way we approach software development. AI can generate code snippets. An AI-powered code editor can suggest more efficient algorithms or highlight potential errors in real-time. It can also help developers to reproduce bugs more easily and to test potential fixes more quickly. This can save developers a significant amount of time and effort, and it can help them to debug their code more effectively. Be aware that AI still makes a lot of mistakes when writing code, so you will need to be particularly careful in testing code it writes. AI can make programming more accessible. With AI-generated code and automated programming tools, individuals who are not experienced developers can still create software solutions. Experienced developers can also more easily write code in languages with which they aren’t familiar. This has the potential to democratize programming and lead to new innovations from a wider range of people. Can complete novices learn how to program using AI tools? At present, it is very challenging for complete novices to rely on AI chatbots to learn how to write code. These tools will sometimes write buggy code or code that doesn’t not work as intended, and a person with no programming experience will have difficulty learning how to spot these sorts of mistakes. AI chatbots are best thought of as a supplement to your software development education, not as a replacement, and they should be used alongside other resources. Therefore, we advise that you seek expert review and assistance. As AI tools are refined, complete novices may be able to create software products more readily on their own. AI is changing the skills required for writing code. Rather than simply memorizing syntax, developers need to focus on developing their creativity, intuition, and problem-solving skills. Additionally, programmers must also hone their abilities in architecture design and project management, as these skills are becoming increasingly important in the modern software development landscape. As AI continues to evolve, it is likely to have an even greater impact on the way we develop software in the future. As a programmer using AI, it is up to you to be responsible for what you create. You must rigorously test any code you write. It is your job to make sure any code you create with AI is not malicious and works as expected. 3.4 Tips for Coding with AI Using AI to write code is as much of an art as writing your own code. Following a few rules and suggestions will make your experience much more enjoyable and productive. Remember that AI cannot read your mind. You need to tell it what you’re thinking, and it may take several rounds of question-and-answer before you get something you can work with. Be clear about your goals. What do you want the AI to do? Be as specific as possible. Start with a simple task. Don’t try to do too much too soon. Start with a simple task and then gradually work your way up to more complex tasks. You wouldn’t write a complete program all at once, and you shouldn’t have AI do it, either. Provide enough data. The more data (and detail) you can give the AI, the better it will be able to learn and perform the task you want it to do. Be patient. Querying an AI can take time. Don’t expect it to be perfect right away. Test everything AI writes. You are responsible for the code that AI generates, even if you didn’t write it yourself. AI can be biased, just like humans, and this could lead to sub optimal development. There is a concern that AI may focus development on areas that optimize certain metrics but not overall quality, effectiveness, or user needs. Human judgment is still needed to ensure that AI is used in a responsible and ethical way. Test your results. Make sure the AI is actually doing what you want it to do. AI systems are trained on data, and if the data is not representative of the real world, the AI system may not be able to generate code that works correctly. It is important to test the code to make sure that it is working as intended and that it is producing the desired results. Inadequate testing could lead to low quality, buggy code. Since AI generated code would be highly automated, it requires strong testing procedures and safety oversight to ensure it meets standards before deployment. Make sure your code is secure. AI systems can be used to generate code that is vulnerable to attack, and it is important to test the code to make sure that it is secure. This can help to prevent security breaches from occurring. Use a variety of AI models. There are many different AI models available. Experiment with different models to see which one works best for your task. Get feedback from others. Ask other people to review your code and give you feedback. This can help you identify and fix any problems with your code. 3.5 Example One: Creating a Function One of the most common things a programmer does is write functions to automate tedious tasks. With proper guidance, AI can speed this process up. Perhaps you need to write a quick function that reverses all the letters in a string. We can query Phind for help. How do I write a function that reverses the letters in a string? To reverse a string in Bash, there are multiple approaches. One way to reverse a string is by using the rev command. According to StackOverflow, the rev command is installed on most Linux systems by default. To use the rev command, simply enter rev followed by the string you want to reverse. For example: $ echo "12345" | rev However, it is important to note that the rev command is not present in all flavors of Unix. Another way to reverse a string is by using a simple Bash script. According to StackOverflow, we can write a Bash script that reverses a string using a for loop. Here is an example script: var="12345" copy=\\${var} len=${#copy} for((i=$len-1;i>=0;i--)); do rev="$rev\\${copy:$i:1}"; done echo "var: $var, rev: $rev" This script takes the string “12345” and stores it in the variable var. It then creates a copy of the string and stores it in the variable copy. The length of the string is stored in the variable len. The for loop starts at the end of the string and iterates through each character, adding it to the variable rev. Finally, the script prints the original string and the reversed string. Another approach to reversing a string is by using the tac command. According to StackOverflow, we can use the tac command to reverse the contents of a file or the standard input. Here is an example command: $ echo "12345" | sed 's/./&\\n/g' | tac | sed -e :a -e 'N;s/\\n//g;ta' This command uses the sed command to insert a newline character after each character in the string. It then passes the output to the tac command, which reverses the order of the lines. Finally, another sed command is used to remove the newline characters and print the reversed string. Overall, there are multiple approaches to reversing a string in Bash. The rev command, a simple Bash script with a for loop, and the tac command are all viable options. Phind offers you a variety of ways to do this, but all these options are in the language Bash. Because we didn’t include a programming language in our query, Phind picked one for us. Let’s refine our query a little. How do I write a function using Python that reverses the letters in a string? To write a function using Python that reverses the letters in a string, there are several approaches that can be taken. One approach involves using a for loop and the concatenation operator to join the original string in reverse order. This technique is demonstrated in realpython.com as follows: def reverse_string(text): result = "" for char in text: result = char + result return result print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and initializes an empty string result. The for loop iterates over each character in the input string, concatenating it with the current content of result and reassigning the new intermediate string to result at each iteration. At the end of the loop, result holds a new string as a reversed copy of the original one. Another approach involves using slicing to reverse the string. This technique is demonstrated in stackoverflow.com as follows: def reverse_string(text): return text[::-1] print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and returns the string in reverse order using slicing. The syntax text [::-1] means to slice the entire string with a step of -1, which reverses the order of the string. A third approach involves using Python’s built-in reversed() function to reverse the string. This technique is demonstrated in datagy.io as follows: def reverse_string(text): return ''.join(reversed(text)) print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and uses the reversed() function to return an iterator that produces the characters of the string in reverse order. The join() method is then used to convert the iterator back into a string. All three approaches produce the same output, but they differ in terms of performance and readability. The first approach using a for loop and concatenation is the most straightforward, but it can be slower for large strings because it creates successive intermediate strings only to throw them away in the next iteration. The second approach using slicing is more efficient because it creates a new string object only once, but it may be less readable for some programmers. The third approach using reversed() is concise and readable, but it may be slower for large strings because it creates a reversed iterator that must be converted back into a string using join(). Once again, Phind has come through with multiple options. Now let’s say you knew you wanted to write a function like this, but you realize it’s only the strings in the third column of your file. Let’s add that requirement to our query. “How do I write a function using python that reverses the letters in a string for the third column of my file?” To write a function using Python that reverses the letters in a string for the third column of a file, you can use the extended slice syntax in Python. The extended slice syntax is a more concise and faster way of slicing strings in Python. To reverse a string using extended slices, you can use the following syntax: string[::-1] In this syntax, the first colon indicates the start of the slice, the second colon indicates the end of the slice, and the third colon indicates the step size of the slice. By setting the step size to -1, you can reverse the string. To create a function that reverses the letters in a string for the third column of a file, you can use the following code: def reverse_string(string): return string[:2] + string[2:5][::-1] + string[5:] In this code, the reverse_string() function takes a string as input and returns the string with its third column reversed. The first two characters of the string are returned as-is, while the third column (characters 3 to 5) are reversed using extended slices. Finally, the remaining characters of the string are returned as-is. There are other ways to reverse a string in Python, including using loops, recursion, stacks, and function calls. However, using extended slices is the most concise and efficient way to reverse a string in Python. Here are some pros and cons of using extended slices to reverse a string: Pros: Concise syntax: using the extended slice syntax is a one-liner that is easy to read and write. Fast execution: using the extended slice syntax is faster than using loops or other methods to reverse a string. Cons: Limited functionality: the extended slice syntax can only be used to reverse a string, and cannot be used for more complex string operations. Not beginner-friendly: the extended slice syntax is not intuitive for beginners who are not familiar with Python’s slicing syntax. In conclusion, to write a function using Python that reverses the letters in a string for the third column of a file, you can use the extended slice syntax in Python. This syntax is concise and efficient, but may not be beginner-friendly and may not be suitable for more complex string operations. 3.6 Example Two: Debugging Code A vast amount of time in programming is spent on debugging code. Often the compiler error messages are very precise about what has gone wrong while simultaneously lacking enough detail for the programmer to immediately understand how to fix the error. It is good practice to debug code as you go, instead of writing the entire program and then testing to identify the errors. Using an AI chatbot makes this task much, much easier. There are two options for using ChatGPT for debugging. First, we could query what the error code means. Here’s a Python code snippet that is throwing a TypeError: unsupported operand type(s) for +: 'int' and 'str' error when we try running it. def add_numbers(a, b): result = a + b return result number1 = 10 number2 = "20" sum = add_numbers(number1, number2) print("The sum of", number1, "and", number2, "is", sum) What does this error mean? TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’ This error typically occurs when you try to concatenate (i.e., add together) a string and an integer using the “+” operator in Python. In Python, the “+” operator is used for both addition and concatenation depending on the data types of the operands. If you try to add a string and an integer using the “+” operator, Python will raise a TypeError because these two data types are not compatible for this operation. For example, if you have the following code: age = 25 message = "I am " + age + " years old." Python will raise the TypeError because you are trying to concatenate a string (“I am”) with an integer (25) using the “+” operator. To fix this error, you can convert the integer to a string using the str() function: age = 25 message = "I am " + str(age) + " years old." In this case, the str() function converts the integer age to a string before concatenating it with the other string, so the operation is valid and no error will be raised. ChatGPT helpfully explains what TypeError means, what causes it, and how to fix it. This is quite useful when you don’t understand the error message. Another option is to submit the code snippet that is causing an error to ChatGPT directly and ask for it to fix the mistake. Here’s another buggy code snippet, this time in C#. Can you debug this code? using System; class Program { static void Main() { int[] numbers = { 1, 2, 3, 4, 5 }; int sum = 0; for (int i = 1; i <= numbers.Length; i++) { sum += numbers[i]; } Console.WriteLine("The sum of the numbers is: " + sum); } } Sure, I can help you debug this code. Here are the issues I see: The loop is going out of bounds: The loop is iterating from 1 to numbers.Length, but array indexes start from 0. Therefore, when i is equal to numbers.Length, numbers[i] is trying to access an element outside the bounds of the array, which will result in an IndexOutOfRangeException. To fix this, you should change the loop condition to i < numbers.Length. The loop is not summing the array elements correctly: The loop is adding the elements starting from the second element instead of the first. To fix this, you should change the initial value of i to 0, which corresponds to the first element of the array. Here’s the corrected code: using System; class Program { static void Main() { int[] numbers = { 1, 2, 3, 4, 5 }; int sum = 0; for (int i = 0; i < numbers.Length; i++) { sum += numbers[i]; } Console.WriteLine("The sum of the numbers is: " + sum); } } This should correctly sum the elements of the numbers array and output the result. Be aware that if you choose to submit your code directly to an AI chatbot to debug that you submit a snippet at a time (if possible). Otherwise the answer can get quite unwieldy! 3.7 Example Three: Comparing Syntax You may be in the middle of writing code when you realize that you’ve used two possible syntaxes for the same command. Perhaps you are using the R package here and are wondering if two different ways of writing a command are the same. Is here::here(\"raw\", \"data.csv\") equivalent to here::here(\"raw/data.csv\")? First, let’s see what ChatGPT responded. When we first started putting together this course, we had intended to include an answer from Bard that disagreed with the answer from ChatGPT as an example of why it can be helpful to query multiple chatbots. (We decided ChatGPT was correct after verifying its answer in the here package documentation.) However, when we queried Bard again two weeks later, we were pleasantly surprised to discover that Bard had corrected its answer. Now it’s a cool little example of how the chatbots are rapidly improving and “learning” from their mistakes. 3.8 Example Four: Figuring Out a Common Task Sometimes you have a task that needs done that is fairly routine. You might not have done it before, but plenty of people in the world have. In this case, you don’t want to reinvent the wheel, but you may not know exactly where to start. AI is a great option for figuring out the basic workflow steps. A very common task for anyone working with genomic samples is to combine multiple files in the vcf format. Perhaps you’ve been asked to do this and then create a separate file that contains the names of the samples. You turn to ChatGPT for help. Can you write a program that merges vcf files and then outputs the sample names? Great! Of course, maybe you’re not comfortable using bcftools, or you try this and can’t get it to work. You can take advantage of the “Regenerate Response” option in ChatGPT to get another solution. What’s another way I can do this? You can also specify some of the features you’d like in your code, or where you might want the code to come from. Maybe you want to practice using GitHub repositories, so you specifically request it. Can you write a program that merges vcf files and then outputs the sample names using a github repo? If you know you’ve used something like a GitHub repository, you can ask ChatGPT for the citation. Just be aware that you need to verify that any citation it gives you should be checked independently. Unfortunately, AI bots can still respond with imaginary journal articles. (Assume that if you can’t find it, it doesn’t exist.) What’s the citation for the repo used in this code? In general, after you find a solution that works for you, you should always make a habit of asking the AI where it got the code. Never, ever, ever run code without knowing where it came from. What’s the source for this code? You should always take care that you are only using code that you’re authorized to use when writing code. This means asking the AI chatbot where code snippets are from and tracking down those citations and authorizations yourself (instead of just trusting the AI that everything is okay). 3.9 Example Five: Planning Your Program Let’s say we want to create software that plays the card game blackjack. You can jump right in and ask your AI chatbot to write the code, but you’re unlikely to be successful and incredibly likely to get frustrated. Before you start interacting with the AI, you need to plan out what you want to do. Everyone needs a plan of attack when building software. You should start by identifying what your software will do, then you can figure out the steps necessary to make that happen. We’ll ask Claude-instant (as implemented by Poe) how we can go about creating the necessary code for our game. How would I go about writing a program that can play blackjack? That seems like a really good roadmap for building a basic game, but if you’re ambitious (and of course you are), you might want to give your game a little extra something. Claude-instant has some ideas for that, too. What are some other features I could add to this game? At this point, you are ready to begin writing your program. You have a lot of tasks to do, so you have to pick a logical starting point. If you’re uncertain, you can ask AI for advice on your first task. What should be my first coding task for this program? In this case, the AI went beyond just suggesting a first task and also offered several options for how to do it as well. 3.10 Example Six: Selecting a Coding Language You will need to choose the language for your software. If you don’t, AI will likely pick a language for you, which may or may not be one you want. You might choose a language based on: Your preexisting knowledge or experience with the language A specific use case where the language is commonly used The language(s) your colleagues or coworkers use You can find a description of other languages you might encounter here. Let’s ask Claude-instant again for some recommendations on the appropriate languages for our proposed blackjack game from the above example. (Pro tip: the various AI bots all keep track of your previous queries, so you don’t have to restate your goal for the software if you’re continuing the conversation.) What would be a good coding language to create this game of blackjack? Claude-instant actually gave us more information than we anticipated. If it hadn’t stated pros and cons while offering language suggestions, we could ask for them directly. This could be helpful if you’re deciding between two options like, say, Python or Java. What are the pros and cons of using Python vs Java for this project? 3.11 Hands On Exercise Now it’s your turn to try! 3.11.1 Code Here’s the code you need for question #2. def calculate_average(numbers): total = 0 for number in numbers: total += number average = total / len(numbers) return average numbers = [1, 2, 3, 4, "5"] average = calculate_average(numbers) print("The average of the numbers is:", average) 3.11.2 Questions Write a prompt to create a function that replaces all instances of the string “three” with the number 3, then create a prompt that allows you to modify the function so that “three”, “Three”, and “THREE” are all replaced with the number 3. Write a query that helps you debug the code snippet from above. Create a query that explains how to retrieve data from an online database. Don’t forget to find out what the source of the code is. Write a prompt to plan out a program that creates and plays a game of Rock, Paper, Scissors. You should create a multi-step plan as well as figure out what your first coding task should be. Create a prompt that identifies the best coding language for the above game. 3.12 Summary Coding has become an essential skill in today’s world. AI can increase the efficiency of planning, writing, and debugging code, as well as make coding more accessible for beginners. Querying AI on code writing tasks requires patience. AI cannot read your mind, and users need to be both specific and patient when asking for coding help. It is essential to test all AI-written code while in development, since inadequate testing could lead to low quality, buggy code. A programmer is responsible for what they create while coding with AI, even if AI writes the actual code. devtools::session_info() ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-14 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## curl 4.3 2019-12-02 [1] RSPM (R 4.0.3) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## highr 0.8 2019-03-20 [1] RSPM (R 4.0.3) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.3) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## ottrpal 1.0.1 2023-03-28 [1] Github (jhudsl/ottrpal@151e412) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["refactoring-code.html", "Chapter 4 Refactoring Code 4.1 Learning Objectives 4.2 Refactoring Basics 4.3 AI in Code Refactoring 4.4 Example One: Correcting Syntax 4.5 Example Two: Styling and Convention 4.6 Example Three: Staying DRY 4.7 Example Four: Keeping Code Alive 4.8 Example Five: Making Code Readable 4.9 Example Six: Keep it Brief 4.10 Code optimization 4.11 Challenges and Limitations 4.12 Hands-On Exercise 4.13 Summary", " Chapter 4 Refactoring Code 4.1 Learning Objectives Describe how refactoring code involves optimization for maintainability, efficiency, and reuse Explain why refactoring code is important for developers in the long-term Recognize the benefits and limitations of using AI tools to refactor code, as well as why AI tools are uniquely poised to be beneficial Implement prompt strategies that can be used to assist with refactoring code for correcting syntax, for adopting more consistent styling, for making code more concise, for making code easier to maintain, and for making code more efficient 4.2 Refactoring Basics Code refactoring is the process of improving the quality of underlying code without changing its functionality. In other words, it’s a way of cleaning up and optimizing code so that it’s easier to maintain and more efficient. This often involves making small changes to the code, such as renaming variables or functions, reorganizing code blocks, or simplifying complex expressions. Refactoring is an essential practice in software development and helps to ensure that the codebase remains manageable and adaptable as requirements and business needs change over time. Code refactoring helps to reduce technical debt, which is the accumulation of development work that needs to be done in the future as a result of taking shortcuts or using less than optimal solutions. Refactoring can help to identify potential issues and bugs before they cause problems, resulting in more stable software. Finally, refactoring code to be more readable and less verbose can create fewer headaches for you in the future. To understand why code refactoring is important, imagine trying to cook in a disorganized and cluttered kitchen. You might be a fantastic cook, but you are spending more time searching for utensils and ingredients, and the quality of the meal could suffer as a result. Similarly, code that is disorganized and cluttered can be difficult to work with and can result in lower-quality software. Refactoring is like cleaning and organizing the kitchen - it makes the process of cooking (or coding) smoother and more efficient. 4.3 AI in Code Refactoring Code refactoring has historically been done manually by developers. This involves reviewing code and identifying areas that could be improved or optimized, and then making changes to the codebase accordingly. Though important, this is process is time-consuming and labor-intensive, as it requires developers to carefully review every line of code to identify potential issues or areas for improvement. Additionally, manual code refactoring is error-prone, as developers can accidentally introduce bugs or errors into the codebase while making changes. However, AI has significant potential to help with code refactoring. AI can use machine learning algorithms to analyze large amounts of code and identify patterns or areas that could be improved. For example, they can identify sections of code that are redundant, overly complex, or difficult to maintain, and suggest changes that could be made to improve the codebase. Machine learning algorithms can also help to identify potential bugs or security issues in the codebase, which can help to improve the overall quality and stability of the software. AI refactoring is also faster and more accurate than manual refactoring. This is particularly useful for large-scale software projects with massive codebases, where manual code review and refactoring can be an enormous task. In the next sections, we’ll take a look at some examples of using AI to refactor code. 4.4 Example One: Correcting Syntax Programming languages operate using their established set of rules. For example, Python uses indentation to define code blocks. However, when you shift between development environments, indentation and line breaks can sometimes be affected, causing major headaches! Let’s say we run the following code: def calculate_sum(a,b): result = a + b return result def calculate_product(a,b):result = a * b return result def main(): a=5 b=6 sum_result = calculate_sum(a, b) print("Sum:", sum_result) product_result = calculate_product(a,b) print("Product:", product_result) if __name__ == '__main__': main() ## unexpected indent (<string>, line 3) We get the error above because our indentation is off. Let’s ask ChatGPT to clean things up. Can you assist me with correcting the formatting and indentation issues in my Python code? Of course! Here is the code after correcting the formatting and indentation: def calculate_sum(a, b): result = a + b return result def calculate_product(a, b): result = a * b return result def main(): a = 5 b = 6 sum_result = calculate_sum(a, b) print("Sum:", sum_result) product_result = calculate_product(a, b) print("Product:", product_result) if __name__ == '__main__': main() ## Sum: 11 ## Product: 30 Much better! This can be quite handy when you can’t immediately see what is wrong with the formatting or if you are working with a large codebase. 4.5 Example Two: Styling and Convention Sometimes programming languages have more flexible rules where there are multiple ways to take the same or almost the same action. For example, in the R programming language, there are several ways to perform variable assignment. However, is is generally accepted among R programmers that the <- operator should be used. Single and double quotes can be used for character types interchangeably, but it’s generally a good idea to stick with one or the other. 4.5.1 Commonly Accepted Convention Let’s say we have the following code: a <- "apple" b = 'banana' "carrot" -> c c(a, b, c) ## [1] "apple" "banana" "carrot" This code does not produce any errors, but it doesn’t follow typical R convention. Let’s ask ChatGPT to clean things up. Can you help me reformat my R code to follow accepted style conventions? Sure, here’s the cleaned-up R code: a <- "apple" b <- "banana" c <- "carrot" Much better! This code will now look more familiar to other R programmers. 4.5.2 Visual Styling Some languages are flexible about line breaks, indentation, and spacing among code blocks. If you’re rapidly developing, things can get messy pretty quickly. Many development environments, for example RStudio and PyCharm, have the ability to format your code for you without changing any of the code itself. However, you can also get AI to do this according to your own preferences. d <- c(rep(a,5),rep(b,5),rep(c,5),paste(a,b,c),paste(c,b,a)) This code is pretty hard to read. Let’s see if AI can do better. Can you space out this R code to make it more readable? Sure! Here’s the same code with added spacing for improved readability: d <- c(rep(a, 5), rep(b, 5), rep(c, 5), paste(a, b, c), paste(c, b, a)) Better! It’s easier to interpret because only one action is being taken on each step. By spacing out the code and breaking it into multiple lines, it becomes easier for the human eye to scan for structure, logic, errors, and/or inconsistencies. 4.6 Example Three: Staying DRY The DRY (Don’t Repeat Yourself) programming tenet is essential to good software. By avoiding repetition in code, we can reduce the amount of duplicate code we need to write and maintain. If we need to change the functionality of a certain piece of code, we have to make the same change in multiple places, which can be time-consuming and error-prone. Let’s take the following Python code: def print_greetings(): print("Hello, Alice!") print("Hello, Bob!") print("Hello, Carol!") print("Hello, David!") print("Hello, Eve!") print_greetings() ## Hello, Alice! ## Hello, Bob! ## Hello, Carol! ## Hello, David! ## Hello, Eve! Notice how repetitive this code is? print() and the string \"Hello, \" appear five times. We can improve this! Could you please refactor my Python code to make it more DRY? Sure, here’s a refactored version of your code: def print_greeting(name): print(f"Hello, {name}!") def print_greetings(): names = ["Alice", "Bob", "Carol", "David", "Eve"] for name in names: print_greeting(name) print_greetings() ## Hello, Alice! ## Hello, Bob! ## Hello, Carol! ## Hello, David! ## Hello, Eve! Now let’s imagine your software lead has asked us to change the greeting to Spanish. Notice how we only have to change the second line of code: def print_greeting(name): print(f"¡Hola, {name}!") def print_greetings(): names = ["Alice", "Bob", "Carol", "David", "Eve"] for name in names: print_greeting(name) print_greetings() ## ¡Hola, Alice! ## ¡Hola, Bob! ## ¡Hola, Carol! ## ¡Hola, David! ## ¡Hola, Eve! Much easier. 4.7 Example Four: Keeping Code Alive During software development, code is often changing rapidly. It’s common to end up with functions or variables you don’t need anymore. It’s best practice to remove “dead” code from your codebase: it’s distracting and can make maintenance more challenging. Programmers should resist the urge to hang on to unused code in the expectation that they might need it one day. Version control, such as provided by git and GitHub, let you keep a record of old code without cluttering your final version. Let’s take a look at some Python code with some dead code: def default_values(): a = 5 b = 6 return a, b def calculate_sum(a, b): result = a + b return result def calculate_product(x, y): return x * y def simulate_data(): a, b = default_values() dat = calculate_sum(a, b) + calculate_sum(a, b) return dat def main(): a = 5 b = 6 a, b = default_values() sum_result = calculate_sum(a, b) print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 Some of these functions aren’t used in the final step, typically defined in python as the main() function. Let’s ask AI to remove them. I need help with eliminating unused functions in my Python script. Sure! Here’s a refactored version: def default_values(): a = 5 b = 6 return a, b def calculate_sum(a, b): result = a + b return result def main(): a = 5 b = 6 a, b = default_values() sum_result = calculate_sum(a, b) print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 Great! Now our code is much more manageable. This code only includes the necessary functions needed to run the main() function. We can go a step further! Can you help me refactor my python code to make it more concise? Certainly! Here’s a more concise version of your code: def calculate_sum(a=5, b=6): return a + b def main(): sum_result = calculate_sum() print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 In this version, the calculate_sum function is simplified to take default arguments for a and b. This means the default_values function is no longer necessary. The a and b variables are also removed from main, as they are no longer needed. Much better and easier to look at overall. 4.8 Example Five: Making Code Readable Usually, our first goal as programmers is to get the response we need from the code. Readability might be the last thing on our minds. Luckily, AI can help with this. Let’s imagine we whipped up this code pretty quickly. It is tough to read, and it could definitely be improved. for(term in c("good","bad","ugly")){ print(c(rep(term,3),paste("really",stringr::str_to_upper(term)))) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" Let’s see if AI can do better. Can you help me clean up this R code so it is more readable? Sure! Here’s a more readable version of the code: library(stringr) for(term in c("good", "bad", "ugly")) { print(c(rep(term, 3), paste("really", str_to_upper(term)))) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" I added whitespace around operators and after commas for better readability. I also loaded the stringr package explicitly using library() to make the code more self-contained. This is a good start, but not much has changed. Let’s try again. On ChatGPT, we can click “Regenerate Response”. Here’s a cleaned-up version of the R code that is more readable: library(stringr) terms <- c("good", "bad", "ugly") for (term in terms) { rep_term <- rep(term, 3) upper_term <- str_to_upper(term) really_upper_term <- paste("really", upper_term) result <- c(rep_term, really_upper_term) print(result) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" I’ve stored the repeated term and the uppercased term in separate variables to make the code more readable. I’ve also separated the construction of the “really” phrase from the uppercased term to make it clearer what’s happening. Finally, I’ve used whitespace and indentation to make the structure of the code more apparent. Better! While the code takes up a greater number of lines, it’s easier to interpret because only one action is being taken on each step. 4.9 Example Six: Keep it Brief As we learned above, following the DRY principle and removing dead code can make your code easier to interpret, maintain, and debug. However, sometimes the issue is the functions themselves. No one has perfect knowledge of everything in their programming language. AI can help us determine if there are any more efficient sets of functions that do the same thing. Here is some code that plays with the “CO2” dataset in R: library(datasets) for (type in unique(CO2$Type)) { dat_subset <- CO2[CO2$Type == type, ] for (plant in unique(dat_subset$Plant)) { dat_subset_2 <- dat_subset[dat_subset$Plant == plant, ] print(paste(dat_subset_2$Plant[1], round(mean(dat_subset_2$uptake), 1))) } } ## [1] "Qn1 33.2" ## [1] "Qn2 35.2" ## [1] "Qn3 37.6" ## [1] "Qc1 30" ## [1] "Qc2 32.7" ## [1] "Qc3 32.6" ## [1] "Mn1 26.4" ## [1] "Mn2 27.3" ## [1] "Mn3 24.1" ## [1] "Mc1 18" ## [1] "Mc2 12.1" ## [1] "Mc3 17.3" Let’s see if AI has any ideas for making it more concise. Can you make this code more concise? Here’s a more concise version using dplyr package in R: library(dplyr) CO2 %>% group_by(Type, Plant) %>% summarize(mean_uptake = mean(uptake)) ## # A tibble: 12 × 3 ## # Groups: Type [2] ## Type Plant mean_uptake ## <fct> <ord> <dbl> ## 1 Quebec Qn1 33.2 ## 2 Quebec Qn2 35.2 ## 3 Quebec Qn3 37.6 ## 4 Quebec Qc1 30.0 ## 5 Quebec Qc3 32.6 ## 6 Quebec Qc2 32.7 ## 7 Mississippi Mn3 24.1 ## 8 Mississippi Mn2 27.3 ## 9 Mississippi Mn1 26.4 ## 10 Mississippi Mc2 12.1 ## 11 Mississippi Mc3 17.3 ## 12 Mississippi Mc1 18 This code groups the CO2 dataset by Type and Plant, calculates the mean uptake for each group using summarize, and then prints the resulting data frame. The code is shorter and more concise because it uses the dplyr package, which has built-in functions for data manipulation and summarization. Refactoring the code to be more concise using dplyr makes it easier to read and understand the steps involved in the data manipulation. It also reduces the number of nested loops and intermediate variables, making the code more efficient and reducing the chance of errors. Ultimately, it is your job as a programmer to use your best judgment on the fine balance between brevity and readability, as the best solution is not always this obvious. Remember that you can regenerate the AI’s response if you want more options. 4.10 Code optimization Sometimes time and efficiency is of the essence. Loops can be slow and time consuming in many languages. Take for example the following R code that calculates the cross product between two vectors x and y using nested for loops: set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) z <- numeric(n) # Start timer start_time <- proc.time() for(i in 1:n) { for(j in 1:n) { z[i] <- z[i] + x[i]*y[j] } } # End timer proc.time() - start_time ## user system elapsed ## 280.405 0.068 280.464 What is a faster, more optimized way of running the following R code? set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) # Start timer start_time <- proc.time() z <- outer(x, y, "*") z <- rowSums(z) # End timer proc.time() - start_time ## user system elapsed ## 0.733 0.308 0.942 The outer() function performs the same calculation as the nested loop in the original code, but more efficiently. It returns a matrix of all possible combinations of x and y values, with each element of the matrix being the product of the corresponding x and y values. The rowSums() function is then used to sum the elements of each row of the matrix, which is equivalent to summing the products of x and y for each index i in the original loop. This method avoids the need for the nested loop, resulting in a faster and more efficient computation. So much faster! We can look at another option by regenerating the response: set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) z <- numeric(n) # Start timer start_time <- proc.time() z <- x %*% t(y) # End timer proc.time() - start_time ## user system elapsed ## 0.424 0.340 0.496 One optimized way to perform the same calculation is by using the %*% operator to perform matrix multiplication. This can be done by converting x and y to matrices and transposing one of them so that their dimensions align for matrix multiplication. This code should be much faster than the original implementation because it takes advantage of highly optimized matrix multiplication algorithms in R. While this second suggestion is faster, you will need to consider what aspects of the codebase are most important in each instance. For example, this code runs more quickly, but the %*% operator might be unfamiliar to some R programmers. In cases where efficiency is less important, or the data are not large, you might consider maximizing readability. 4.11 Challenges and Limitations Like humans, AI can make mistakes. Be sure to keep the following in mind as you use AI to refactor your code. 4.11.1 Complexity Refactoring is often a complex process that requires deep understanding of the code and its context. AI may not always be able to fully understand the complexity of the code and may struggle to identify the best refactoring strategy. Use a modular approach whenever possible. 4.11.2 Limited Data AI models require large amounts of data to learn from, but in the case of code refactoring, there is often limited data available. This can make it difficult for AI models to generalize to new code bases and situations, especially if you are using a more niche programming language. 4.11.3 Quality Control Automated refactoring tools that use AI may not always produce code that is of the same quality as code produced by human developers. It can be difficult to always ensure that the refactored code is maintainable, efficient, and free of bugs. You need to use your best judgment when copying and pasting AI-produced code into your codebase. You should always include unit tests in your code. Tests can help you catch bugs, including those introduced accidentally by AI. Because AI models are created by humans, they can be biased. This means they may not always identify your preferred refactorings or may prioritize certain types of refactorings over others. In some cases, this can lead to suboptimal code quality and may create technical debt over time. 4.11.4 Security When using AI to refactor code, the code itself is often sent to an external service or platform for analysis and transformation. This can raise concerns about the security of the code, especially if it contains sensitive information such as trade secrets, proprietary algorithms, or personal data. If your code is sensitive, it’s important to carefully vet any third-party AI tools or services used in the refactoring process. 4.12 Hands-On Exercise Now it’s your turn to try. 4.12.1 The Code Let’s say you are dusting off some code from your past (no judgment here). You were investigating tweets about Mr. Trash Wheel, a beloved Baltimore-based contraption that filters trash out of the waterways. Note: This code is just an example and was written strictly for educational purposes. import tweepy import pandas # Enter your API keys and access tokens here consumer_key = 'your_consumer_key' consumer_secret = "your_consumer_secret" access_token = 'your_access_token' access_token_secret = 'your_access_token_secret' # Authenticate with Twitter API auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q="Mr. Trash Wheel", tweet_mode='extended').items(100) # Create empty list to store tweet data date_data = [] location_data = [] text_data = [] def get_tweet_length(tweet): # Return the length of the tweet text return len(tweet.full_text) # Loop through each tweet and extract desired data for tweet in tweets: date_info = { 'date': tweet.created_at } date_data.append(date_info) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q="Mr. Trash Wheel", tweet_mode='extended').items(100) # Loop through each tweet and extract desired data for tweet in tweets: location_info = { 'location': tweet.user.location } location_data.append(location_info) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q='Mr. Trash Wheel', tweet_mode='extended').items(100) # Loop through each tweet and extract desired data for tweet in tweets: text_info = { 'text': tweet.full_text } text_data.append(text_info) # Combine lists into a dictionary data = {'date': date_data, 'location': location_data, 'text': text_data} # Store results in pandas dataframe df = pandas.DataFrame(data) # Print dataframe print(df) 4.12.2 Questions Create an AI prompt that fixes any formatting issues with the code that would cause it not to run. Devise an AI prompt that removes any dead code from your sample above. What gets removed? Create a prompt that makes the code less repetitive, adhering to the DRY principle. What aspect of the code was repetitive? Construct a prompt that makes the code more concise. What are some trade-offs that appear in this code between readability and brevity? 4.13 Summary Code refactoring is the process of improving code quality without changing its functionality. It is crucial in software development to maintain a manageable and adaptable codebase. Code refactoring reduces technical debt, improves code stability, and makes it easier to maintain. Examples of using AI for code refactoring include correcting syntax, adhering to styling and convention, visual styling, avoiding repetition, removing dead code, and improving both readability and speed of execution (optimization). The use of AI for code refactoring raises ethical concerns and is not perfect. It is important for the developer to consider security needs of their code, as well as test out their code. "],["annotating-your-code.html", "Chapter 5 Annotating Your Code 5.1 Learning Objectives 5.2 Annotating Code is Good Practice 5.3 Improves readability 5.4 Improves maintainability 5.5 Improves the quality 5.6 Examples of quality aiding comments 5.7 AI in Code Annotation 5.8 Ethics of using AI to annotate code 5.9 Examples on using AI for annotation 5.10 Summary", " Chapter 5 Annotating Your Code 5.1 Learning Objectives Explain the significance and benefits of code annotation and how it aids in understanding and working with code in the future. Demonstrate the ability to apply effective code annotation practices by providing clear and concise comments in code. Recognize the impact of code annotation on code maintainability. Explain how well-annotated code can facilitate collaboration among developers and ensure code consistency. Critically evaluate the use of AI in code annotation, including benefits, ethical considerations, and limitations. Explore how AI can be used to create README files, do line-by-line annotations, and offer potential code improvements. 5.2 Annotating Code is Good Practice Code annotation, also known as code commenting, is the process of adding explanatory notes to source code. These notes are used to provide context, clarify functionality, and aid in understanding for developers who may be working on the codebase in the future. Code annotation is an important practice for any developer looking to write clear, efficient, and maintainable code. There are many benefits to annotating code: Improves readability Improves maintainability Improves quality 5.3 Improves readability First, it can help to improve the readability of code. By adding comments, developers can explain the purpose of each section of code, which can make it easier for others to understand what the code is doing. This can be especially helpful for large or complex codebases, where it can be difficult to keep track of all the different components. This is particularly important in contexts where the original developer of a project may move on to something else and others are left to work on them. Annotation perhaps most often helps a developer remember things they knew about the code when they were originally writing it (annotation is helpful for future you!) If the original developer left well-annotated code, it can drastically improve the ability of others (and their future selves) to continue with the project. 5.3.1 Examples of readability aiding comments Comments that help readability clarify what the code is doing Function explanations: A comment at the beginning of a function or method can describe its purpose, parameters, and expected return values. This makes it easier for others to understand what the function does and how to use it. # This function calculates and returns the sum of two numbers (x and y) def add_numbers(x, y): return x + y Inline explanations: Use inline comments to explain what a specific line or block of code does. This can help someone reading your code to quickly understand what’s going on. # Loop through each item in the list for item in my_list: # Check if the item is greater than 10 if item > 10: # Print the item print(item) TODO comments: Use TODO comments to mark areas of your code that need further work or improvement. This can help you or others to remember to come back to a specific part of the code. # TODO: Implement error handling for this function def my_function(): pass Parameter descriptions: If a function or method has complex parameters, it can be helpful to add comments explaining what each parameter does. def my_function(parameter1, parameter2): """ Calculate the sum of two numbers. Parameters: parameter1 (int): The first number to be added. parameter2 (int): The second number to be added. Returns: int: The sum of parameter1 and parameter2. """ return parameter1 + parameter2 Code block summary explanations: If you have a long or complex code block, you can add a comment to explain what the block is doing. # This code block creates a dictionary containing the counts of each word in a list of text strings word_counts = {} for text in text_list: for word in text.split(): if word not in word_counts: word_counts[word] = 0 word_counts[word] += 1 5.4 Improves maintainability Code annotation can help to improve the maintainability of code. By adding comments, developers can explain the reasoning behind certain decisions, which can make it easier for others to make changes to the code without breaking it. This can be especially helpful when multiple developers are working on the same codebase, as it can help to prevent conflicts and ensure that the code is always in a consistent state. 5.4.1 Examples of maintainability aiding comments Comments that help maintainability explain the historical context of why code was made the way it was Examples of maintainability aiding comments Design decisions: If there were specific design decisions made when creating the code, you can add comments explaining why certain choices were made. bash # We chose to use a linked list data structure for this function to reduce the time complexity of inserting and deleting elements. Legacy code: Sometimes, code may have been written in a certain way due to constraints or limitations at the time it was created. Adding comments to explain this can help others understand why the code is the way it is. # This code was written before Python 3, which introduced the `yield from` syntax. Therefore, we used a `for` loop to iterate over the nested list. Performance optimizations: If certain performance optimizations were made to the code, you can add comments explaining why they were necessary. bash # We used memoization to improve the time complexity of this recursive function, as it was taking too long to execute for larger inputs. Compatibility considerations: If the code was written with compatibility considerations in mind, you can add comments explaining why certain choices were made. # We used the `os.path` module to ensure that this code will work on both Windows and Unix-based systems, as the path separators are different on each platform. Limitations: If there are limitations or edge cases that the code cannot handle, you can add comments to explain this to others. # Note that this function assumes that the input array is sorted in ascending order. If the array is unsorted, the results may be incorrect. 5.5 Improves the quality Code annotation can help to improve the quality of code. By adding comments, developers can identify potential issues or edge cases, which can help to prevent these issues from occurring in the first place. This can be especially helpful when testing code, as it can help to ensure that the code is working as expected. 5.6 Examples of quality aiding comments Comments that help improve the quality of the code by explaining to others how to use it or help improve it Error handling: Adding comments to explain how and why error handling is being implemented can help ensure that your code is robust and able to handle unexpected inputs or errors. This helps others know how the code was intended to be used. # If the input argument is not a list or is empty, raise a ValueError if not isinstance(input_list, list) or len(input_list) == 0: raise ValueError("Input must be a list that is not empty") Complexity: If your code has particular complexities, adding comments that explain it can help others understand the performance characteristics of your code. It may help others identify whether there is a simpler way to write the code. By adding a comment that expresses uncertainty about the code and asking for suggestions, the author can potentially receive feedback from others on how to improve the code. # This block of code could probably be simplified, but I'm not sure how. new_list = [] for i in old_list: if i > 0: new_list.append(i) Constants and variables: Adding comments to explain the purpose of constants and variables can make your code easier to use. It not only notifies others of the variables existence but lets them know if they need to change the parameters for their own purposes. # This constant represents the maximum allowed number of retries when attempting to connect to the server. MAX_RETRIES = 3 # This variable tracks the number of failed attempts to connect to the server. num_retries = 0 Code organization or disorganization: Adding comments to explain the organization and structure of your code can make it easier for others to navigate and may bring out a better organizational strategy for your code. # Define helper functions for data processing def preprocess_data(input_data): # Implementation details def analyze_data(processed_data): # Implementation details def visualize_data(analyzed_data): # Implementation details # Load data from file input_data = load_data("input.txt") # Preprocess data processed_data = preprocess_data(input_data) # Analyze data analyzed_data = analyze_data(processed_data) # Visualize data visualize_data(analyzed_data) Assumptions and constraints: Adding comments to explain the assumptions and constraints of your code can help others understand its limitations and potential pitfalls. # This function assumes that all input values are positive integers. Negative or non-integer inputs may result in unexpected behavior. 5.7 AI in Code Annotation We’ve discussed why code annotation and documentation are important, but when and why would you use AI for code annotation? AI can be a nice tool to supplement the annotation of your code. It should not be the only source of annotation for your code, because as the code author, you need to verify that what AI has annotated is consistent with your knowledge and history of development of the code. But using AI to annotate your code can be useful to supplement existing code annotations or to annotate old code that was poorly annotated either by yourself or others who are no longer working on the project. Here are some of the benefits of using AI for code annotation: Speed and efficiency: AI algorithms can analyze code much faster than humans, which means they can quickly generate comments and annotations for large codebases. This can save developers significant time and effort, allowing them to focus on other aspects of the development process. Consistency: Unlike humans, AI is not affected by personal biases or preferences, so it can provide consistent annotations across different code files and projects. A human may underestimate places in the code that should have annotation, whereas an AI might be more consistent at putting annotation in these places. This can help ensure that all code in a project is well-documented and easy to understand. Objectivity: AI can analyze code objectively and identify potential issues that may have been missed by humans. This can lead to better quality code that is easier to maintain and less prone to errors. Learning: AI models can learn from large datasets of annotated code and improve their ability to generate comments and annotations over time. This means that the more code the AI model analyzes, the more accurate and effective it becomes at annotating code. Specificity: AI models can be trained on specific programming languages, frameworks, or libraries, allowing them to generate language-specific comments and annotations that are tailored to the needs of the project. This can be particularly helpful for large, complex projects that require specialized knowledge or expertise. 5.8 Ethics of using AI to annotate code However, there are also a number of potential ethical concerns associated with using AI to annotate code. For example, we don’t yet know how and in what ways AI models may be biased. Additionally, AI models may be opaque, which could make it difficult to understand why they made certain annotations. Finally, because AI models are not humans and don’t necessarily tell the stories behind the code, they may be used to generate annotations that are not accurate, helpful, or do not tell the full depth of the history of what occurred with the code, which could lead to problems. Given these potential benefits and concerns, it is important for users of AI to always realize that they are ultimately accountable for the annotation that an AI model makes, and careful review of this annotation is needed. There are a number of ethical considerations to take into account when using AI to annotate code. Here are a few of the most important considerations: Accountability: First and foremost, a user of AI is always primarily accountable for any output that they use from an AI model. AI models can give you annotation to start with, but it is up to you as the user to verify and review this output carefully. The user of the AI model is ultimately responsible for keeping or throwing out the annotations the AI makes and thus is responsible for using errors that the model makes. Much like a user of a Google Search engine is responsible for which results they use, a user of AI is responsible what output from the AI model they use. Transparency: It is important to be transparent about the use of AI in code annotation. This means disclosing the fact that AI is being used, as well as the specific AI model and dataset that is being used. This should be stated on every file where annotation has been made using AI. This transparency allows others who view the code to be able more fully interpret the AI-created annotation that accompanies the code. Bias: AI models are trained on data, and this data can introduce bias into the model. It is important to be aware of the potential for bias and to take steps to mitigate it. This can be done by using a diverse dataset, by carefully selecting the features that are used to train the model, and by using techniques such as adversarial training. It is important to provide annotations that indicate any known biases, possible limitations associated with bias, and any strategies that were used to mitigate bias. Explainability: It is important to be able to explain the decisions that an AI model makes. This is especially important in the case of code annotation, where the decisions made by the AI model can have a significant impact on the quality of the code. There are a number of techniques that can be used to explain the decisions of an AI model, such as visualization and rule extraction. You can (and should) ask a chatbot to explain its sources and rationale for output that it gave. (Be aware that actual citations it gives may or may not be accurate, and you need to verify the accuracy of those citations by doing your own follow up literature search). 5.9 Examples on using AI for annotation 5.9.1 Ask for a README summary To try out some annotation with code examples, this author dug up some old code that she had no idea about what it did anymore because time had passed and she didn’t document it very well. First, I wanted ChatGPT to make a README for me. README’s are a standard file that explain what the code base does and provides a summary of what is happening in the set. I asked ChatGPT if it could: Create a README for this code and then I pasted the old code. ChatGPT gave me some output that generally looked like a README: This README seems accurate given the work I was doing at the time. But obviously the title isn’t great – it doesn’t describe the code well. Plus there’s not really information about usage or outputs of the code. So I asked it: Can you make this README more succinct and with sections about how you would use this code? This yielded a much better result. 5.9.2 Ask for line by line annotation Second, I wanted to have ChatGPT annotate code line by line. So I asked it Can you annotate some R code for me? It not only provided helpful annotations but it fixed spacing and added more spaces between lines of code: It does however, think the code is Python apparently if we look in the upper left corner. But this doesn’t seem to have affected its accuracy in its annotations. 5.9.3 Ask for how the code could be improved This code is old, and I don’t think it runs properly and is in no way reproducible. So I wanted to see what ChatGPT would recommend for improvements to the code. I said How could I make this code better? It had a lot of great advice: First and foremost it told me my variable names are not consistent and also not descriptive. Which is completely accurate! There’s variable names like xx and ll which don’t tell the reader anything about the variable. 5.9.4 Style the code Next, I asked chatGPT if it could restyle the code into tidyverse style. It definitely put elements that look like tidyverse however, it also put items in the code that I can tell by looking at it won’t work. So I tried to tell it about the errors: :::{.query} This code is close but there are some errors. These tidyverse functions don’t work with matrices. ::: and it told me: You are correct that the tidyverse functions are designed to work with data frames and not matrices. Here's a modified version of your code that converts the matrix to a data frame and uses tidyverse functions: The lesson here is that the output of chatGPT still needs to be vetted by the person asking for it. It is a great idea to continue to have AI’s work on something if it isn’t quite what you are asking for, however, at the end of the day it is you, the human, who has to vet the output. 5.9.5 Sometimes it has trouble with file types if given a URL Copying and pasting code appears to be the best way to go at this point in time. I asked chatGPT if it could annotate code in a Jupyter notebook at a URL I gave it: Can you annotate the code at this URL? https://github.com/jhudsl/reproducible-python-example/blob/main/make_heatmap.ipynb It responded with some of the code at that file where it was annotated but most of the notebook was not there. I told ChatGPT: That isn’t all the code at that url ChatGPT told me: You're right, I apologize. The code at the given URL is actually in a Jupyter Notebook format, so it's not easy to copy-paste here. However, I can suggest an alternative approach: you can use the website nbviewer (https://nbviewer.jupyter.org/) to view the notebook and its code. Here are the steps to do so: Go to https://nbviewer.jupyter.org/ Paste the URL of the notebook https://github.com/jhudsl/reproducible-python-example/blob/main/make_heatmap.ipynb into the "Enter the location of a Jupyter Notebook" field and press "Go". This will display the notebook and its code, and you can read through it and add your own annotations as needed. I hope this helps! This was a nice tool to recommend, however it still told me I should add my own annotations at the end of this. 5.10 Summary Code annotation is important for readability, maintainability, and overall quality of code AI can be a helpful tool for supplementing annotations or creating annotation for poorly annotated code. Ultimately the code author who uses AI as a tool for annotation is still responsible for verifying that the annotations are consistent with their knowledge of the code. devtools::session_info() ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-14 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## curl 4.3 2019-12-02 [1] RSPM (R 4.0.3) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## highr 0.8 2019-03-20 [1] RSPM (R 4.0.3) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.3) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## ottrpal 1.0.1 2023-03-28 [1] Github (jhudsl/ottrpal@151e412) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["understanding-unfamiliar-code.html", "Chapter 6 Understanding Unfamiliar Code 6.1 Learning Objectives 6.2 Reading Unfamiliar Code Is A Skill 6.3 AI in Reading Code 6.4 Example One: Summarizing Code You Didn’t Write 6.5 Example Two: Identifying a Coding Language 6.6 Example Three: Interpreting Regex 6.7 Example Four: Demystifying Functions 6.8 Limitations 6.9 Hands-On Exercise 6.10 Summary", " Chapter 6 Understanding Unfamiliar Code 6.1 Learning Objectives Recognize the benefits, limitations, and assumptions in using AI to understand unfamiliar code Consider the ethical questions around using AI to understand another person’s code Practice using AI to summarize code, identify an unknown coding language, and interpret regex and unknown functions 6.2 Reading Unfamiliar Code Is A Skill As a programmer, you will frequently encounter situations where you need to read and understand code written by other developers. This could be because you are collaborating on a project, you have taken over maintenance of an existing codebase, or you are learning new syntax or skills that require reading example codes. Comprehending code written by someone else is a distinct skill from being able to write your own code, and it takes practice to become proficient at it. Reading unfamiliar code is like exploring an unfamiliar city without a map or a guide. Just like in a new city, you may not know where to start or how to navigate the codebase. You may encounter unfamiliar syntax, functions, and libraries that you have never seen before, just as you might encounter new streets, buildings, and landmarks. At first, you may feel disoriented and overwhelmed, and may need to spend some time getting oriented and familiarizing yourself with the environment. As you explore, you may start to see patterns and similarities, just as you might begin to recognize neighborhoods and landmarks in a new city. You may also encounter dead-ends, confusing intersections, and unexpected detours, just as you might encounter bugs and errors in the code. To make progress in this unfamiliar territory, you may need to rely on a combination of intuition, deduction, and experimentation. You may need to break down the code into smaller pieces, analyze the behavior of each piece, and gradually build up a mental model of how the code works. You may also need to consult documentation, online resources, or other experts who are familiar with the codebase, just as you might ask locals or consult a map in a new city. In the end, with patience, persistence, and a willingness to learn, you can successfully navigate the unfamiliar code and achieve your goals, just as you can successfully explore a new city and discover its hidden treasures. Learning how to read and understand code written by someone else is a valuable skill that can help you be a more effective programmer. By being able to read and comprehend unfamiliar code, you can contribute to projects more quickly, diagnose and fix bugs more effectively, and learn new programming techniques and best practices. As with any skill, the more you practice reading and understanding code written by others, the more comfortable and proficient you will become at it. 6.3 AI in Reading Code While anyone can learn to understand code written by someone else, using AI language models to achieve this has three significant advantages. Summarize large codebases. AI can quickly scan large codebases and give you a high-level overview of what they do. This can save you a lot of time and effort, especially when dealing with complex codebases. For example, AI can identify the main functions and classes in a codebase, and it can show you how they are related. Analyze the code. AI can look at the code and tell you about its structure, complexity, and potential defects quickly and efficiently. This can help you understand the code better and make it easier to maintain and improve. For example, AI can identify parts of the code that are difficult to understand or maintain, and it can suggest ways to improve the code. Understand comments and documentation. AI can read comments and documentation written by other developers. This can help you understand the code’s purpose and the assumptions and limitations of the codebase. For example, AI can identify the key assumptions that are made in the code, and it can explain the trade-offs that were made in the design of the code. Keep these ethical considerations in mind when using AI to summarize code. Who owns the code? If you are not the owner of the code, you should get permission from the owner before summarizing it. What is the purpose of the summary? If you are summarizing the code for educational or research purposes, it is likely ethical to do so. However, if you are summarizing the code for commercial purposes, you may need to get permission from the owner. Does the code contain any sensitive or personal information? If the code contains any sensitive or personal information, you should take special care to protect that information. 6.4 Example One: Summarizing Code You Didn’t Write Here’s some simple code using the R programming language, written by an unnamed programmer. If you are familiar with R, you might be able to understand what this code does with little effort. However, if you don’t know R, you’re relatively new to R, or you’re just having a bad day, understanding what this bit of code can take a great deal more energy and time. AI language bots can ease that burden. library(tidyverse) d <- ggplot2::diamonds d_sel <- d %>% select(carat, cut, color, clarity, depth, table, price) d_fil <- d_sel %>% filter(price > 5000) d_bin <- d_fil %>% mutate(carat_bins = cut_width(carat, width = 0.2)) d_summ <- d_bin %>% group_by(carat_bins, cut, color, clarity) %>% summarize(mean_price = mean(price), sd_price = sd(price)) We first ask Claude-instant (as implemented by Poe) to explain the purpose of this code. What does this code do? You can ask for Claude to summarize each step in detail as well. Can you explain what each step does in detail? Maybe you’ve never worked with the tidyverse package in R - Claude can tell you about it. What is the tidyverse package? You might also not be familiar with the diamonds dataset (especially if you’ve never used it before) and want some clarification on where the data are from. Can you tell me more about the diamonds dataset? 6.5 Example Two: Identifying a Coding Language Sometimes you might have to work with legacy code. Legacy code can be difficult to work with, especially if it is written in a language or style with which you are not familiar. There are more than 700 programming languages in use today, so it is impossible for any programmer to know them all. AI can be a helpful tool for identifying the language and version of legacy code, which can make your life just a little easier. Let’s look at example code that might have been written decades ago. program temperature_smog_analysis; uses Math; const n_temperatures = 1000; n_smog_measures = 500; type TemperatureArray = array[0..n_temperatures-1] of integer; SmogArray = array[0..n_smog_measures-1] of integer; var temperatures: TemperatureArray; smog_measures: SmogArray; combined_data: array of integer; slope, y_intercept: double; i, j: integer; function connect_to_database(filename: string): integer; begin { implementation of connect_to_database function } end; function retrieve_temperatures(fd: integer): integer; begin { implementation of retrieve_temperatures function } end; function retrieve_smog_measures(fd: integer): integer; begin { implementation of retrieve_smog_measures function } end; function retrieve_temperature(fd, index: integer): integer; begin { implementation of retrieve_temperature function } end; function retrieve_smog_measure(fd, index: integer): integer; begin { implementation of retrieve_smog_measure function } end; procedure close_database(fd: integer); begin { implementation of close_database function } end; We can ask Bard to take a guess at what the coding language might be. What language is this code written in? We can explore further to figure out what indicators tell Bard the language is Pascal. How do you know this is Pascal? It turns out there are multiple versions of Pascal. Since you will presumably need to work with this legacy code, you might want to know which version it is. What version of Pascal is this? 6.6 Example Three: Interpreting Regex Regex (short for Regular Expressions) is a powerful tool used to describe patterns in text. It’s a sequence of characters that define a search pattern. Regular expressions can be a powerful tool for data cleaning, text mining, and data validation. They are widely used in web development, data science, and other fields where text processing is important. They can also be tricky to understand at first because they involve a specific syntax that can be complex. We can use ChatGPT to explain regex in a bit of someone else’s code. x = sub("/d/e","/d",x) What does this do? 6.7 Example Four: Demystifying Functions Sometimes we get handed code that includes complex architecture. Perhaps you are working with code you wrote years ago when you really liked loops, but present you finds them difficult to parse. (Be nice to past you - maybe you were a baby programmer and didn’t know better.) Instead of giving yourself a headache, you could turn to ChatGPT to explain what your old code does. def my_function(x): result = x for i in range(10): for j in range(5): result = result + 2 * (i + 1) * (j + 1) * (i % 2 == 0 and j % 2 == 0) - 1 return result What does this function do? Well, now you have an idea what past-you might have been attempting to do with this code. You can also query AI about the potential problems from using the code as-written. What are some possible issues with this code? (And if you do ever run across an expression like this in your code, you can check out our chapter on refactoring code using AI to fix it!) 6.8 Limitations Although chatbots can be extremely helpful for better understanding someone else’s code, if there is a lack of annotation or documentation, it will be difficult to discern more about the context of the development. Cases where developers may not have provided context information include: Not stating their purpose for writing this code. Not describing what they planned to do next. Not describing (or being aware of) possible parts of the code that need updating or maybe have security or privacy issues. Not describing how they made certain decisions in the development process. Although we can’t truly understand some of this information, it is however possible to get some assistance from chatbots with prompts such as: What do you think the purpose of this code is? What might be possible next steps to further develop this code? What are possible issues with this code, particularly for security or privacy? Why was the code possibly written with this structure? What other options are possible? Chatbot tools are also limited in terms of how up-to-date their training data is to know about current possible issues with code. 6.9 Hands-On Exercise Now it’s your turn to try. Let’s say you were handed this bit of code to work with (and eventually modify), but first you need to figure out what’s going on with it. Unfortunately, the original programmer left very little in the way of notes or annotation. 6.9.1 The Code Note: This code is just an example and was written strictly for educational purposes. use ode_solvers::{Euler, OdeMethod}; use plotters::prelude::*; use std::fs::File; const N: f64 = 1000000.0; const I0: f64 = 10.0; const R0: f64 = 0.0; const S0: f64 = N - I0 - R0; const BETA: f64 = 0.3; const GAMMA: f64 = 0.1; fn sir_model(t: f64, y: &[f64], v: f64) -> Vec<f64> { let s = y[0]; let i = y[1]; let r = y[2]; let dsdt = -BETA * s * i / N - v * s; let didt = BETA * s * i / N - GAMMA * i; let drdt = GAMMA * i + v * s; vec![dsdt, didt, drdt] } fn main() { let mut solver = Euler::new(sir_model); solver.set_initial_condition(&[S0, I0, R0]); let root = BitMapBackend::new("sir_vaccination.png", (800, 600)).into_drawing_area(); root.fill(&WHITE).unwrap(); let mut chart = ChartBuilder::on(&root) .caption("SIR model with vaccination", ("sans-serif", 40)) .set_label_area_size(LabelAreaPosition::Left, 60) .set_label_area_size(LabelAreaPosition::Bottom, 40) .build_cartesian_2d(0.0..100.0, 0.0..N) .unwrap(); let mut data1 = vec![]; for _ in 0..1000 { let y = solver.integrate(0.1, 0.0); data1.push((solver.time(), y[0], y[1], y[2])); } let vac_rate = 0.1; let vac_coverage = 0.5; let vac_num = vac_coverage * N; let mut data2 = vec![]; for i in 0..1000 { let t = i as f64 * 0.1; let v = if t >= 50.0 && t < 150.0 {vac_num * vac_rate} else {0.0}; let y = solver.integrate(0.1, v); data2.push((solver.time(), y[0], y[1], y[2])); } chart .configure_mesh() .x_labels(10) .y_labels(10) .disable_x_mesh() .disable_y_mesh() .draw() .unwrap(); chart .draw_series(LineSeries::new( data1.iter().map(|d| 6.9.2 Questions Write a query that identifies the features that can be used to determine the coding language, as well as the version of the language. What language and version is it written in? What features give away the coding language? Create a prompt that determines what this code does, as well as important features of the program. What does this code do? Write a prompt that determines the purpose of this specific code snippet: let root = BitMapBackend::new("sir_vaccination.png", (800, 600)).into_drawing_area(); What is the purpose of this line of code? 6.10 Summary Understanding code written by someone else can be made easier by using AI AI chatbots can summarize the code and the dataset used, as well as clarify functions and expressions that are confusing AI chatbots can also identify the coding language and version used in legacy code Be mindful of who owns the code and what the summary will be used for when using AI to summarize someone else’s code "],["ai-for-bioinformatics.html", "Chapter 7 AI for Bioinformatics 7.1 Learning Objectives 7.2 AI in Bioinformatics 7.3 Using these topics for Bioinformatics 7.4 Hands-On Exercise 7.5 Summary", " Chapter 7 AI for Bioinformatics 7.1 Learning Objectives Understand the potential applications of AI assistance in bioinformatics software development Identify the challenges and considerations when using AI in bioinformatics software development Grasp the serious ethical concerns when designing bioinformatics software that deals with sensitive data 7.2 AI in Bioinformatics Bioinformatics is an interdisciplinary field that combines biology, computer science, statistics, and data analysis to extract insights from biological data. It involves the development and application of computational tools, algorithms, and databases to store, organize, analyze, and interpret vast amounts of biological information, particularly in the context of genomics and public health. Biomedical imaging and electronic health record informatics are also rapidly growing subdisciplines. Bioinformatics is especially important for research and work done with genomes. In genomics, bioinformatics enables the identification and annotation of genes, the prediction of protein structures and functions, and the analysis of genetic variations, such as single nucleotide polymorphisms and gene copy number variations. These analyses help in unraveling the genetic basis of diseases, discovering potential drug targets, and facilitating personalized medicine approaches. The proliferation of genomic data in recent years have been transformative for the field of genomics. Advances in DNA sequencing technologies, particularly next-generation sequencing, have revolutionized the ability to generate vast amounts of genomic data quickly and at a reduced cost. This has led to an unprecedented increase in the availability and accessibility of genomic information. Researchers estimate that we will generate between 2 and 40 exabytes of genomic data within the next decade. Cloud computing platforms like AnVIL are on the forefront of addressing the big genomic data storage and processing challenge. You have seen principles in previous chapters that likewise apply to creating bioinformatics software. Creating bioinformatics software, however, comes with its own unique sets of challenges: Data Complexity: Genomic data is vast and complex, often consisting of large datasets with multiple dimensions, such as nucleotide sequences and quality scores, gene expression levels, and epigenetic modifications. Developing algorithms and software tools to efficiently handle and analyze this data requires expertise in data management, processing, and interpretation as well as the biological domain knowledge. Diverse Data Types: Bioinformatics software often handles diverse and complex data outside of strings and numeric types, such as Bioconductor’s Single Cell Experiment data class. Knowledge of common data types used in bioinformatics is required to create functional software. Algorithmic Complexity: Developing accurate and efficient algorithms for various bioinformatics tasks, such as sequence alignment, gene prediction, and protein structure prediction, requires deep understanding of biological principles and computational techniques. Designing algorithms that can handle noise, errors, and biological variations is a persistent challenge. Integration with Existing Workflows: Bioinformatics software is often part of a larger analysis pipeline or workflow. Integrating AI-based tools seamlessly into existing bioinformatics pipelines and workflows, ensuring compatibility, and maintaining data integrity throughout the process requires special attention. Ethical Considerations: People are typically more sensitive about their health data when compared to their Amazon purchases. This is for a good reason: there is a painful history of the medical establishment performing experiments on people with less socio-political power. Like medical record data, genomic data is supposed to be carefully protected so that it can’t be used to disadvantage certain individuals. AI tools on the other hand, are not typically designed with security and data sovereignty in mind. 7.3 Using these topics for Bioinformatics When using AI to help generate code examples, perform refactoring, understand code, or create unit tests, you should keep the following in mind: Writing Code: AI can be used to automatically generate bioinformatics code snippets or functions. It can provide code suggestions, auto-completion, or even generate code based on given specifications or data classes. This can include creating some unit tests for your functions to ensure they are working as expected. Refactoring Code: AI can aid in refactoring bioinformatics code to improve readability, maintainability, and performance through suggestions for code restructuring. Annotating Code: Bioinformatics tools can be used for clinical decision making. It’s important that anyone vetting or auditing your tool can trace any decision making steps. Therefore, clear understanding and documentation are key. AI can assist in drafting documentation and code annotations, such as through summarizing code functionality, clarifying complex expressions or functions, and/or providing plain-terms explanations of the underlying algorithms. Understanding Code: Although bioinformatics is rapidly evolving, its often specific use cases mean that sometimes developers have to work with code that is quite old or outdated. AI can help identify the coding language and version used in legacy bioinformatics code. This can be valuable when working with older software or when integrating different codebases. NEVER submit chunks of code to AI tools without being 100% sure that they are free from personally identifiable information (PII). PII includes variables like gender, race, birth date, geographic location, and disease diagnosis. If you need to, generate dummy data to test your code. 7.4 Hands-On Exercise Now it’s your turn to try. 7.4.1 The Code You are working on improving an alignment algorithm. Alignments are typically used for comparing samples from individuals to a reference consensus sequence. To make sure things are working, you want to compare the alignment created by an existing function (pairwiseAlignment) to an alignment created by your tool (my_new_Aligner). Note: This code is just an example and was written strictly for educational purposes. if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") if (!requireNamespace("Biostrings", quietly = TRUE)) BiocManager::install("Biostrings") library(Biostrings) reference <- DNAString("ATCGTACGTA") query <- DNAString("ATCGTAGCTA") # Perform pairwise sequence alignment alignment <- pairwiseAlignment(reference, query) # Perform sequence alignment using your algorithm my_alignment <- my_new_Aligner(reference, query) # Compare pairwise alignment to my_alignment my_evaluation <- my_alignment_evaluator(reference, query, alignment, my_alignment) 7.4.2 Questions Things are going great with the toy data! You decide to make a tiny subset of your data for testing your aligner by copying the top 100 sequences from patient 1. You then submit the aligner and data to Bard. You prompt, “is there anywhere handling this data can be made more efficient?” Why is this problematic? Can AI be used to improve annotation in the code above? Why or why not? You prefer writing loops, but have learned recently that R works more efficiently using apply() functions. Is it okay to ask ChatGPT to refactor a small loop into an apply() function? Why or why not? A colleague gives you a flash drive with some code from years ago. They swear it contains some useful snippets that will improve your aligner, but it’s in a language you haven’t worked with before. You inspect the code, and don’t believe it contains protected data. The algorithm was published many years ago in a leading journal. Is it okay to ask AI what the code is doing? 7.5 Summary Like other types of software, bioinformatics software and tools can benefit from AI assistance for writing, refactoring, annotating, and understanding code. Biological data and workflows can be complex. AI does not replace a thorough understanding of biological processes at play. NEVER submit any protected or sensitive data to AI applications. This can constitute a serious, often illegal breach of subject privacy if you are dealing with data from people. "],["languages-and-cloud-computing.html", "Chapter 8 Languages and Cloud Computing Programming Languages AI and Cloud Computing", " Chapter 8 Languages and Cloud Computing Programming Languages Here are commonly used languages and their typical use cases. Python Known for its readability, simple syntax, and comprehensive add-on modules. It is often recommended as a beginner-friendly language. Python is used for scientific computing, data analysis, machine learning, and web development. Many Data Scientist roles rely on Python. Ruby Similar to Python, Ruby is also known for its readability and user-friendly syntax. It is typically used for web development, building dynamic websites, and creating web applications. JavaScript A popular language for web development, known for its versatility and ease of use. Many of the interactive features of websites rely on JavaScript. Java Java is widely used for enterprise applications, android mobile apps, and web apps. PHP PHP is easy to learn but can become complex as the application grows. It is typically used for building dynamic web pages, web applications, and content management systems. R Used for statistical computing, data analysis, and data visualization, R is a popular choice for researchers and data scientists. Some beginners find its syntax complex, especially due to differences in the two major dialects: “base” R versus the “tidyverse”. Kotlin Kotlin is a relatively new language that is designed to be more concise and expressive than Java. However, its advanced features and concepts may be challenging for beginners. It is used for developing Android mobile apps, web development, and server-side development. Swift Used for iOS and macOS app development, Swift is known for its strict syntax and steep learning curve. C# Often used for Windows desktop applications, web applications, and video games, C# can be challenging due to its more complex syntax and strict type system. C++ A powerful language often used for developing high-performance applications and operating systems, C++ can be challenging to learn due to its complexity and the need to manage memory manually. Julia Julia is designed for numerical and scientific computing, data analysis, and machine learning. It has a syntax that is easy to learn and read, making it an ideal choice for data scientists, engineers, and researchers. Julia code can also be compiled to native machine code using the Julia compiler. Because Julia is a relatively new language, it has fewer add-on libraries and resources available compared to more established languages. MATLAB MATLAB is a high-level programming language and development environment commonly used for numerical computation, data analysis, and visualization. It is widely used in engineering, physics, and finance, among other fields. It is generally considered to have a user-friendly syntax and is relatively easy to learn compared to other languages. MATLAB has a development environment called MATLAB Compiler that allows you to package MATLAB code into standalone executables or shared libraries for use on a wide variety of operating systems. Fortran Fortran has been around for decades and has a long history in scientific and engineering communities. It has a more challenging syntax and is considered harder to learn than some of the other languages on this list. It is primarily used for scientific and engineering applications that require numerical computations and simulations such as computational physics, weather forecasting, and computational chemistry. Rust Rust is a systems programming language that focuses on speed, reliability, and memory safety. It is commonly used for developing performance-critical software, such as operating systems, web browsers, and game engines. Rust’s syntax is similar to C++, but is often considered to have a steeper learning curve due to its focus on memory safety and ownership concepts. AI and Cloud Computing AI in cloud computing is an exciting and rapidly advancing field that combines artificial intelligence (AI) and cloud computing technologies. Cloud computing provides on-demand access to computing resources over the internet, such as storage, processing, and analytics, while AI uses algorithms and models to enable machines to perform tasks traditionally requiring human intelligence. Developers can easily incorporate AI into applications using cloud-based AI services. For example, Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform offer AI services like speech recognition, image recognition, natural language processing, and machine learning. AI optimizes cloud resources and performance. It allocates resources based on workload demands, optimizes power usage, detects threats, and improves application latency by predicting user behavior and preloading content. However, integrating AI and cloud computing presents challenges around data, security, privacy, ethics, complexity, and management that must be addressed carefully. Large amounts of data are needed to train AI models, especially deep learning models requiring vast labeled data to identify patterns and predictions accurately. Storing and processing huge data can be expensive, impacting cost, scalability, and environment performance. Ensuring data security, privacy, and ethical use also require consideration. AI and cloud computing are complex with specialized skills and knowledge to use effectively. Integration increases complexity, challenging management and maintenance. Careful planning and execution evaluates benefits and limits of AI; selects appropriate models and algorithms; and determines integration into the cloud environment. Monitoring evaluates model performance, enabling adjustments to optimize performance and efficiency. As both AI and cloud computing technologies advance, the intelligent integration of the two can provide on-demand access to advanced AI capabilities, optimize resources, facilitate smarter applications, and gain data-driven business insights. Integration challenges around data, security, ethics, and complexity must be managed proactively through evaluation, planning, monitoring, and adjustments to maximize benefits and minimize limitations. When implemented responsibly, AI in cloud computing enables remarkable intelligent and scalable systems transforming organizations and society. "],["about-the-authors.html", "About the Authors", " About the Authors These credits are based on our course contributors table guidelines.     Credits Names Pedagogy Lead Content Instructor Elizabeth Humphries (Writing Code, Understanding Unfamiliar Code) Content Authors Candace Savonen (Annotating Your Code), Carrie Wright (Ethics of Using AI), Ava Hoffman (Refactoring Code) Content Editor Reviewer Howard Baek Technical Template Publishing Engineers Candace Savonen, Carrie Wright, Ava Hoffman Publishing Maintenance Engineer Candace Savonen Technical Publishing Stylists Carrie Wright, Ava Hoffman, Candace Savonen Package Developers (ottrpal) Candace Savonen, John Muschelli, Carrie Wright Art and Design Figure Artists Candace Savonen, Carrie Wright, Ava Hoffman, Elizabeth Humphries Funding Funders CS, CW, and HB were supported by the National Cancer Institute (NCI) under Grant UE5CA254170. AMH and EMH were supported by the AnVIL Project through cooperative agreement awards from the National Human Genome Research Institute with cofunding from OD/ODSS to the Broad Institute (U24HG010262) and Johns Hopkins University (U24HG010263). Funding Staff Sandra Ormbrek, Shasta Nicholson   Tools Used: ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-14 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## ottrpal 1.0.1 2023-03-28 [1] Github (jhudsl/ottrpal@151e412) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["references.html", "References", " References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] +[["index.html", "AI for Software Development Harnessing the Power of Large Language Models About this Course Available course formats Suggested Activity Context Learning Objectives Lesson Breakdown", " AI for Software Development Harnessing the Power of Large Language Models July, 2023 About this Course This course on AI for software development explores the use of AI large language models such as ChatGPT, Bard, and others and their potential benefits and challenges. Through examples and hands-on activities, students will develop an understanding of the ways in which AI can speed up software development tasks and free up time for more creative and strategic work. By the end of the course, students will be equipped to navigate the rapidly changing landscape of software development and use AI chatbots in a way that maximizes benefits and efficiency while limiting harm as much as possible. Available course formats This course is available in multiple formats which allows you to take it in the way that best suits your needs. The material for this course can be viewed without login requirement on this Bookdown website. This format might be most appropriate for you if you rely on screen-reader technology. Our courses are open source, you can find the source material for this course on GitHub. The slides for the images in the course can be found here. Suggested Activity Context Course Audience Professionals looking to improve efficiency Students hoping to learn more about programming Anyone curious about how AI can be harnessed for technology Note: Those completely new to programming will find using AI tools to create software solutions challenging at this time. It is helpful to have some fundamental knowledge to write appropriate prompts and to discern when code is not working as expected. We advise novices to seek expert review. Course Prerequisites Some previous exposure to programming recommended. Assessment Type Hands-on activities following each lesson. Learning Objectives Learners will be able to: Explain the basics of AI and its potential for improving software development workflows Implement strategies to write and debug code using AI-based tools Describe the benefits of refactoring code using AI-powered techniques, such as making code readable, keeping it brief, and optimizing code Apply best practices for annotating code using AI Recognize strategies for using AI-based tools to understand and analyze code, such as code comprehension of unfamiliar languages or functions Discuss the challenges and ethical implications of using AI for different aspects of software development Lesson Breakdown Activity Duration Introduction 10 minutes Ethics for Using AI 20 minutes Writing Code 35 minutes Writing Exercise 15 minutes Refactoring Code 35 minutes Refactoring Exercise 15 minutes Annotating Code 25 minutes Understanding Code 25 minutes Understanding Exercise 10 minutes Total 3 hours & 10 minutes "],["introduction.html", "Chapter 1 Introduction 1.1 Learning Objectives 1.2 AI Provides New Opportunities 1.3 AI Code of Ethics 1.4 The AI Chatbots 1.5 Reproducibility With AI Chatbots 1.6 How to Use This Book 1.7 About This Book", " Chapter 1 Introduction The increasing availability of AI chatbots is creating a new landscape in software development and data science. It is important to be aware of the potential benefits and challenges of this technology, so that we can use it in a way that benefits everyone and limits harm as much as possible. On the one hand, AI chatbots have the potential to replace or significantly reduce the workload of programmers. This could lead to job losses in the data science industry. Additionally, there are significant ethical and privacy concerns regarding the use of training data sets to teach these AI systems. For example, if a chatbot is trained on a dataset of biased or discriminatory data, it could perpetuate those biases in its interactions with users. On the other hand, AI chatbots also have the potential to create new jobs and opportunities. For example, chatbot developers will be needed to create and maintain these AI systems. Additionally, AI chatbots could be used to automate tasks that are currently done by humans, freeing up programmers to focus on more creative and strategic work of interpreting what the data actually mean. Ultimately, while the future of programming in software development and data science is unclear, there is much to be optimistic about in this brave new world. The advancements in AI and chatbot technology present an opportunity for programmers to work alongside algorithms to create innovative solutions that were previously unimaginable. As with any technological revolution, there will be challenges, but it is up to us to work together to ensure that we use these tools ethically and responsibly, and to create a future that benefits everyone. 1.1 Learning Objectives In this chapter we will demonstrate how to: Consider the potential benefits and limitations of AI for software development Recognize how to use this course, how it was written, and what to expect Explain general concepts for ethical use of AI tools for software development Summarize what Chatbots are and how they work Distinguish between different AI options for software development 1.2 AI Provides New Opportunities We do not believe that AI chatbots will replace programmers, software developers, and data scientists. While AI can be powerful and useful, it is only as good as the training data behind it and the human using it. Therefore, for the time being and the foreseeable future, humans will remain at the forefront of programming innovation, with AI providing valuable support. As new code is often built on top of existing code written by humans, it is unlikely that AI will fully replace the need for human programmers. However, we do think that the roles of programmers in software development and data science are changing, and some of the skills required to work in this new environment are slightly different from those needed before. In addition to proficiency in programming syntax, programmers may need to develop a more refined ability to construct queries that enable AI bots to create the desired workflows. Other critical skills such as debugging, troubleshooting, and documentation will continue to be valuable, and can be enhanced through the use of AI. Furthermore, AI creates new opportunities and niches for tasks that were previously difficult, time-consuming, or impossible. This is an exciting time for those working in programming, with great potential to explore new and innovative solutions. As AI technology continues to advance, its role in programming will only become more significant, and it is up to developers to adapt and learn how to use these new tools effectively. We asked OpenAI’s ChatGPT whether AI chatbots would replace human software developers, data scientists, and programmers, and this was the response: “It is unlikely that AI chatbots will replace software developers, data scientists, and programmers entirely. While AI chatbots can perform certain tasks related to software development, data analysis, and programming, they are limited in their abilities and require human oversight and input to function properly. AI chatbots can be used to automate certain tasks such as generating simple code snippets, running tests, and performing simple data analysis. They can also assist in the development process by providing quick answers to commonly asked questions and suggesting possible solutions to programming problems. However, more complex software development, data analysis, and programming tasks require a level of creativity, critical thinking, and problem-solving that AI chatbots cannot currently replicate. Software developers, data scientists, and programmers have a deep understanding of the technologies and tools used to develop AI chatbots, as well as the ability to optimize and improve those tools. Furthermore, software development, data analysis, and programming require a high level of human communication and collaboration, which AI chatbots cannot currently replicate. Humans are also essential in making ethical decisions related to AI development, such as ensuring that AI systems are transparent, fair, and unbiased. In summary, while AI chatbots can provide valuable assistance to software developers, data scientists, and programmers, they are unlikely to replace them entirely. Human expertise, creativity, and critical thinking will continue to be essential in developing and optimizing AI systems for the foreseeable future.” 1.3 AI Code of Ethics Our first duty as programmers is to avoid causing harm to people and the world. This includes avoiding both intentional and inadvertent harm. Intentional harm is harm that is caused by a programmer’s deliberate actions. For example, a programmer might create a virus that is designed to harm computers. Inadvertent harm is harm that is caused by a programmer’s negligence or lack of foresight. For example, a programmer might create a program that is buggy and causes data loss, or create a program that is particularly vulnerable to being hacked. We must always be aware of the potential for harm and deliberately take steps to mitigate it. This includes things like: Be aware of the potential for bias. AI systems are trained on data, and that data can reflect the biases of the people who collected it. Programmers should be aware of the potential for bias in their data and take steps to mitigate it. Be transparent about your work. Share information about your code, data, and algorithms. This will help others to understand how your work works and to identify any potential problems. Be accountable for your work. Take responsibility for the consequences of your code. Be willing to make changes if necessary. Participate in discussions about AI ethics. There are many online and in-person communities where you can discuss the ethical implications of AI with other people. Advocate for ethical AI. Speak up when you see AI being used in a way that you believe is unethical. Support organizations that are working to promote ethical AI. 1.4 The AI Chatbots Humans have been interacting with AI chatbots for years. In fact, Alan Turing is credited with coming up with the concept for chatbots as early as 1950. Chatbots are software-based systems that interact with humans typically by text or speech inputs, rather than code. They mimic some human activity (“Chatbot” 2023; Abdulla et al. 2022) based on these language inputs. They process the inputs using natural language processing commonly abbreviated as NLP. NLP is a kind of AI that uses human text or speech and parses the language to determine structures and patterns to extract meaning. NLP uses large amounts of language data (such as books, websites etc.) to train AI systems to identify these structures and patterns. For example, the AI model might identify when a sentence is a question or a statement by examining various features in a prompt such as the inclusion of a question mark of the use of words often used in questions (“Natural Language Processing” 2023; Cahn 2017). The methods used for chatbots have evolved over time. Now chatbots often utilize AI methods like deep learning (which involve multiple layers of abstractions of the input data (“Deep Learning” 2023)) to extract meaning from the language data (“Natural Language Processing” 2023). As these methods use large quantities of text, they are therefore often called large language models (“Large Language Model” 2023). Despite the fact that chatbots have been around awhile, the popularity of OpenAI’s ChatGPT and DALL-E programs has sparked a recent surge of interest. These chatbots are in part particularly powerful due to the fact that large amounts of computing power were used to train their NLP models on very large datasets (Caldarini, Jaf, and McGarry 2022; Cahn 2017). Large language model AIs can be divided into two categories: those that can be reached using an internet browser, and those that can be reached using an integrated development environment (IDE). The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. This is not a comprehensive list! Browser-based chatbots Bard is Google’s large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive. It is trained on a massive amount of text data and is able to communicate and generate human-like text in response to a wide range of prompts and questions. You will need to join the waiting list before getting access, although in our experience the wait time is short. ChatGPT is OpenAI’s large language model based on the GPT (Generative Pre-trained Transformer) architecture. It is designed to understand and generate human-like responses to text-based conversations. You will need to create an account before using it. Phind is a AI-powered search engine designed specifically for developers asking technical questions. In addition to answering your coding question, it will provide links to the internet sources used to find the answer. Poe is Quora’s platform that gives you access to multiple AI chatbots, including Sage, based on OpenAI’s GPT-3.5 architecture; Claude, an AI natural language bot created by Anthropic; and ChatGPT. IDE-based chatbots CoPilot by GitHub is a language bot that interfaces with your editor. This bot provides coding suggestions and ways to speed up testing. GitHub describes it as an “AI pair programmer”. It uses a subscription model with both individual and business tiers. CodeWhisperer is the Amazon Web Services chatbot for the AWS cloud platform. It can be installed in your integrated development environment (IDE) using the latest AWS Toolkit plugin. The individual tier for CodeWhisperer is free, although you will need to create an AWS account and add a payment method. What are the data sources for these chatbots? Some high-profile English-language AIs, including Google’s T5 and Facebook’s LLaMA, were trained on more than 15 million websites (Schaul, Chen, and Tiku 2023). The top sources of training data for these AI models were https://patents.google.com/ and https://www.wikipedia.org/ ! 1.5 Reproducibility With AI Chatbots Reproducibility in programming is the ability to reproduce the results of an analysis using the same data and methods. It’s an important aspect of working with AI tools as well, particularly when using AI to debug or fix code. In order to maximize the reproducibility of your work, you should create good habits when working with any AI tool. This includes documenting your prompts and the intentions behind them, as well as saving the response threads generated by an AI chatbot. Some of the current chatbots already save your threads (particularly ChatGPT and Phind). At this time, Bard and the chatbots accessible via Poe do not save queries as individual threads but do keep a record of the questions and responses for some time. However, it is best practice to save them yourself as the saved chat threads can be reset. Saving them yourself also allows you to write notes about your intentions. Keep in mind that the answers from the chatbots can change over time - don’t expect them to always be the same! Phrasing a query slightly differently can result in a different answer, as can asking in a new session. You should expect to ask the AI tool the same question in multiple ways in order to be confident in the answer. 1.6 How to Use This Book This book is written with the expectation that users will follow along and paste the queries from each example into their own AI chatbot session. After you have finished this book, you will be able to use whichever combination of chatbot and programming environment that you’re comfortable using. 1.6.1 Choice of AI Model You can use whichever AI you like for this course, as long as it is a large language model. (Other types of AIs are not explored in this course.) We will use a variety of browser-based chatbots as examples throughout this course. Some AIs will be better suited for your particular tasks, and it is too early to judge which of the technologies will end up dominating the industry. We encourage you to test multiple AI platforms to find the ones that best fit your needs, and the use of a particular AI in each chapter should not be taken as an endorsement by us. We have focused on the browser-based chatbots for our examples in this course for ease of demonstration. 1.6.2 Computing Environment Needs All the examples and activities in this course can be done in the computer environment you prefer. None of the tasks require a huge amount of computing power, so local installations on your personal computer are fine, though everything will also translate to a high performance computing cluster or a cloud computing platform. In addition to the CodeWhisperer chatbot available on the AWS cloud platform, a variety of cloud-based AI tools and platforms exist that can be leveraged for practical projects after this course. For instance, Amazon SageMaker or Google Cloud AI Platform can be used for constructing and training machine learning models, while Microsoft Azure Cognitive Services can be applied to incorporate natural language processing features into your applications. If you’d like to learn more about how AI can be integrated with cloud computing, you can read about it in the Additional Learning section. 1.6.3 Boxes Throughout this course, you will see several types of specially-colored boxes. In particular, there are two to which you should pay special attention. One box highlights potential ethical issues with an AI usage. In this box, you’ll find reminders of the ethical considerations you should be particularly aware of while using AI to do certain tasks. The other box highlights our queries to AI chatbots. The question we posed to the AI chatbot is written in these boxes to make it easier for you to find. Wherever possible, we use screenshots of the actual answers we got from the AI chatbots in response to our questions. Occasionally when we have asked for the AI to write or fix code, we have formatted the answer in such a way that makes it easier to use. When you see this format, you will know it indicates an AI response Code in these boxes was written by AI. 1.7 About This Book We used AI bots (including ChatGPT, Bard, Sage, and Claude) to develop and write much of this course. We firmly believe this new technology can be used in ethical ways and we put into practice all the rules and suggestions offered over the following modules while we created this book. "],["ethics-of-using-ai.html", "Chapter 2 Ethics of Using AI 2.1 Learning Objectives 2.2 Major concerns 2.3 Replacing Humans 2.4 Bias 2.5 Misinformation 2.6 Unintelligible Code 2.7 Faulty or Buggy Code 2.8 Security and Privacy issues 2.9 Violating Copyright 2.10 Harmful code 2.11 Summary", " Chapter 2 Ethics of Using AI The use of artificial intelligence (AI) and in particular, generative AI, in coding has raised a number of ethical concerns. We will highlight several current concerns, however please be aware that this is a dynamic field and the possible implications of this technology is continuing to develop. It is critical that we as a society continue to evaluate and predict what the consequences of the use of AI will be, so that we can mitigate harmful effects. 2.1 Learning Objectives In this chapter we will demonstrate how to: Describe key ethical concerns for using AI tools in software development Identify possible mitigation strategies for these major concerns Explain the potential benefits of being transparent about the use of AI tools to write code Discuss why human contributions are still important and necessary Recognize strategies for using AI tools to write code more responsibly Implement prompts to ask AI tools about responsible use throughout various coding practices 2.2 Major concerns In this chapter we will discuss the following issues that writing code with AI tools may contribute to: Replacing Humans - AI tools can help humans write code, but they are not a replacement. Bias - AI models are built on data and code that were created by biased humans, this bias can be further perpetuated. Misinformation - fake or manipulated data used to help design algorithms could be believed to be correct and this could be further propagated. Unintelligible Code - Developers need to follow best practices with code generated by AI. Just like code generated by human developers, this includes figuring out how the code works, how it integrates with other code, as well as annotating and documenting how to use the code . Faulty or Buggy Code - Code that may not be optimal for a given situation may be inadvertently used by those with less familiarity, which may result in faulty software. Security or Privacy Issues - Code may not be optimized for security if not adequately evaluated. Generative AI models have access to large amounts of data that is currently unregulated and may include data that should be protected for privacy reasons. Copyright Violations - The code used for the generative AI model may used code that has copyright laws that require attribution or do not allow reuse and it may not be clear where the code came from. Harmful Code - Currently it is not clear how well generative AI models restrict the creation of code that will be used for goals that harm others. Note that this is an incomplete list; additional ethical concerns will become apparent as we continue to use these new technologies. We highly suggest that users of these tools be transparent about the use of these tools, so that as new ethical issues emerge, we will be better prepared to understand the implications for specific coding projects. Be transparent about what AI tools you use to write code. This help others to better understand how you created your code, as well as the possible sources that the AI tools might have used when helping you write code. It may also help with future unknown issues related to the use of these tools. Some organizations and scientific societies have created guidelines or requirements for using AI in journal articles and conference submissions, like the International Society for Computational Biology. Be aware of the requirements/guidelines for your field. It is essential to address these ethical concerns and ensure that the use of AI in coding is done in a responsible and transparent manner. This could be done through ensuring the quality of the data used to train AI systems, promoting transparency in AI-generated code, and implementing safeguards against the creation of harmful or biased code. By doing so, we can harness the potential of AI to improve and transform the way we write and optimize code while maintaining ethical standards. 2.3 Replacing Humans Those who use AI tools to write code or to understand code need to recognize their own value in the process. While AI systems are useful, they do not replace the strengths that humans have for innovating new ways to write code, evaluating how the code integrates into the larger picture of a project, or in evaluating the downstream consequences of the code. Computer science is a field that has historically lacked diversity and new learners can often feel intimidated. There is the potential that new learners may feel even more discouraged as they learn to write code when witnessing AI tools write code. It is critical that we support diverse new learners of computer science, as we will continue to need human involvement in the development and use of AI tools. 2.3.1 Tips for supporting human code Follow best teaching practices for inclusion of diverse learners. Encourage current and future programmers to continue to innovate and write code. Avoid thinking that code written by computers must be better than code written by humans, as this simply is not true. Recall that humans first wrote they code that the AI tools use to write code. Be transparent about when you do or do not use AI tools, give credit to the humans who write code where possible. A new term in the medical field called AI paternalism describes the concept that doctors (and others) may trust AI over their own judgment or the experiences of the patients they treat. This has already been shown to be a problem with earlier AI systems intended to help distinguish patient groups. Not all humans will necessarily fit the expectations of the AI model if it is not very good at predicting edge cases (Hamzelou n.d.). Therefore, in all fields it is important for us to not forget our value as humans in our understanding of the world. 2.4 Bias One of the biggest concerns is the potential for AI to create biased code. AI systems are trained on data created by humans. If this data used to train the system is biased (and this includes existing code that may be written in a biased manner), the resulting code could also be biased. This could lead to discrimination, abuse, or neglect for certain groups of people, such as those with certain ethnic or cultural backgrounds, genders, ages, sexuality, capabilities, religions or other group affiliations. It is well known that data and code are often biased (Belenguer 2022). It is important that evaluations be made throughout the software development process to check for and consider potential perpetuation of bias. 2.4.1 Tips for avoiding bias Be aware of the potential biases in the data that is used to train AI systems. Check for possible biases within the data. Are there harmful data values? Examples could include discriminatory and false associations. Are the data adequately inclusive? Examples could include a lack of data about certain ethnic or gender groups or disabled individuals, which could result in code that does not adequately consider these groups, ignores them all together, or makes false associations. Are the data of high enough quality? Examples could include data that is false about certain individuals. Evaluate the code for biases as it is developed. Check if any of the criteria for weighting certain data values over others are rooted in bias. Consider the possible outcomes of the use of the code. Consider if the code could possibly be used in a manner that will result in discrimination. See Belenguer (2022) for more guidance. We also encourage you to check out the following video for a classic example of bias in AI: 2.5 Misinformation AI tools use data that may contain false information. AI tools may also report data as if it is real, when it is in fact not real. For example, currently at the time of the writing of this course, ChatGPT will report citations with links that are not correct. 2.5.1 Tips for reducing misinformation Be aware that some AI tools currently make up false information based on artifacts of the algorithm. Do not assume that everything that the algorithm produces is real or correct. 2.6 Unintelligible Code There is risk that those less aware of best coding practices use AI-generated code and do not follow these practices. This could make it difficult for others to understand how the code works and could make it hard to identify and fix any issues that may arise. This could result in negative consequences, such as system crashes or security breaches, that could have been avoided if the code had been written by an experienced and savvy human programmer. 2.6.1 Tips for avoiding unintelligible code Code should be reviewed by experienced programmers. Code should be annotated throughout to explain what the code is doing. Documentation should be created that describes how to use the code properly. Ask AI tools to help you understand the code, but get expert assistance where needed. Can you explain what the code in line 4 is doing? 2.7 Faulty or Buggy Code There is a risk that code written with AI tools, if not reviewed carefully enough, may be faulty. AI systems may not be trained on data that relates to current software dependencies or best practices. It is also important to remember that code generated by AI tools is not necessarily better than code written by humans. It requires just as much code review. 2.7.1 Tips for reducing faulty or buggy code Do not assume that the code generate by AI is correct. Realize that AI is only as good or up-to-date as what it was trained on, the code may be generated using out-of-date code. Look up packages and functions used to ensure if the code is up-to-date. Make sure that you understand the code that you are using. AI can be used to help you understand what the code is doing, but consult with experts when needed. Ask the AI tools for extra information about if there are any potential limitations or weaknesses in the code, but keep in mind that the tool may not be aware of issues and therefore human review is required. The information provided by the tool can however be a helpful starting point. Are there any limitations associated with this code? What assumptions were made in writing this code? Stack Overflow, a popular community-based website where programmers help one another, has (at the time of writing this) temporarily banned users from answering questions with AI-generated code. This is because users were posting incorrect answers to questions. It is important to follow this policy (as you may face removal from the community). This policy goes to show that you really need to check the code that you get from AI models. While they are currently helpful tools, they do not know everything. 2.8 Security and Privacy issues AI tools are not always trained in a way that is particularly conscious of data security. Furthermore, if the code is written using these tools by users who are less familiar with coding security concerns, protected data or important passwords may be leaked within the code itself. AI systems may also utilize data that was actually intended to be private. It is also important to consider what data your code might actually be using. 2.8.1 Tips for reducing security and privacy issues Check that all passwords, access tokens (like API keys), security certificates are not saved in a public place where anyone can access or tamper with them. Check that no sensitive data, such as Personal Identifiable Information (PII) becomes public through the code. Utilize encryption and other security practices where needed. Consult with an expert about data security if you think your code could possibly cause someone to access protected data who is not authorized to access it. Promote for regulation of AI tools by voting for such standards where possible. Ask AI tools for help, but to not rely on them alone. Are there any security or privacy concerns associated with this code? 2.9 Violating Copyright When AI systems are trained on data, they may also learn and incorporate code from that data. This means that AI-generated code could potentially infringe on the copyright of the original author of the code. For example, if an AI system is trained on a GitHub repository that contains code written by a human programmer, the AI system could generate code that is identical to or similar to the code in the GitHub repository. If the AI system then uses this code without permission from the original author, this could constitute copyright infringement. In general, we want programmers to feel comfortable sharing their code openly without fear they won’t be credited. Similarly, AI systems could potentially infringe on intellectual property rights by using code that is protected by trademarks or patents. For example, if an AI system is trained on a training manual that contains code that is protected by a trademark, the AI system could generate code that is identical to or similar to the code in the training manual. If the AI system then uses this code without permission from the trademark owner, this could constitute trademark infringement. 2.9.1 Tips for avoiding copyright violations Be transparent about what AI tools you use to write your code. Obtain permission from the copyright holders of any code that you use to train the AI system. Only use code that is in the public domain or that has been licensed for use by the AI system’s owner. Cite any GitHub repositories or training manuals you might use in your code. Ask the AI tools if the code it helped generate used any code that you can cite. Did this code use any code from others that I can cite? 2.10 Harmful code Another major concern is the use of AI to generate malicious code. For instance, AI could be used to create code that spreads malware or hacks into computer systems. This could cause severe damage to individuals and organizations, including data breaches and financial losses. 2.10.1 Tips for avoiding the creation of harmful code Be careful about what code you share publicly, as it could be used for malicious purposes. Consider how the code might be used by others. Ask the AI tools to help you, but do not rely on them alone. What are the possible downstream uses of this code? What are some possible negative consequences of using this code? 2.11 Summary Here is a summary of all the tips we suggested: Disclose when you use AI tools to write code. Credit human authors by citing their code and adhering to copyright restrictions. Ensure the code doesn’t expose sensitive data, such as passwords or PII. Don’t assume AI-generated content is real, accurate, or current. Ask the AI tools to help you understand: Sources for the code that you can cite What the code is doing Code limitations Potential security or privacy issues Potential downstream consequences of the code Always have expert humans review the code and value your own contributions and thoughts. Overall, we hope that these guidelines and tips will help us all to use AI tools more responsibly for writing code. We recognize however, that as this is emerging technology and more ethical issues will emerge as we continue to use these tools in new ways. AI tools can even help us to use them more responsibly when we ask the right additional questions when writing code, but remember that human review is always necessary. Staying up-to-date on the current ethical considerations will also help us all continue to use AI responsibly. "],["writing-code.html", "Chapter 3 Writing Code 3.1 Learning Objectives 3.2 Coding Basics 3.3 Writing Code with AI 3.4 Tips for Coding with AI 3.5 Example One: Creating a Function 3.6 Example Two: Debugging Code 3.7 Example Three: Comparing Syntax 3.8 Example Four: Figuring Out a Common Task 3.9 Example Five: Planning Your Program 3.10 Example Six: Selecting a Coding Language 3.11 Hands On Exercise 3.12 Summary", " Chapter 3 Writing Code 3.1 Learning Objectives Recognize the benefits and capabilities of AI-powered programming tools, particularly for certain kinds of tasks that are difficult for humans Explain how AI-generated code and automated programming tools contribute to making programming more accessible to individuals with limited coding experience or to those working in a new language Explore the ethical considerations associated with using AI in programming, including the importance of rigorous testing before publishing code Apply strategies for using AI to build functions, debug code, understand error messages, check syntax, choose the proper programming language, and plan out a multi-step programming project 3.2 Coding Basics In today’s digitally-driven world, coding has become an essential skill. It is used to create websites, mobile applications, and sophisticated software systems. Contrary to popular belief, coding is not only for those who possess an innate talent for programming. With the right resources and a willingness to learn, anyone can become proficient in coding. Practice is crucial to mastering the art of coding, and those who are willing to commit themselves to the learning process can achieve great results. Writing code can also be a rewarding and fulfilling hobby, too. Creating your own software, building a website, or automating a task can be immensely satisfying, and can provide a sense of accomplishment and pride. Moreover, writing code can be a form of creative expression. Just as musicians use notes to compose music and artists use paint to create visual art, programmers use code to create digital experiences. From building websites to creating video games, code can be used to bring ideas to life in exciting and innovative ways. Ultimately, the ability to write code is a valuable asset in today’s world, with increasing demand for skilled programmers in a wide range of industries. Whether as a career or hobby, coding can be both personally and professionally rewarding. The accessibility and versatility of coding make it an excellent skill for anyone to learn, regardless of their background or previous experience. 3.3 Writing Code with AI AI-powered programming tools and technologies are revolutionizing the way we approach software development. AI can generate code snippets. An AI-powered code editor can suggest more efficient algorithms or highlight potential errors in real-time. It can also help developers to reproduce bugs more easily and to test potential fixes more quickly. This can save developers a significant amount of time and effort, and it can help them to debug their code more effectively. Be aware that AI still makes a lot of mistakes when writing code, so you will need to be particularly careful in testing code it writes. AI can make programming more accessible. With AI-generated code and automated programming tools, individuals who are not experienced developers can still create software solutions. Experienced developers can also more easily write code in languages with which they aren’t familiar. This has the potential to democratize programming and lead to new innovations from a wider range of people. Can complete novices learn how to program using AI tools? At present, it is very challenging for complete novices to rely on AI chatbots to learn how to write code. These tools will sometimes write buggy code or code that doesn’t not work as intended, and a person with no programming experience will have difficulty learning how to spot these sorts of mistakes. AI chatbots are best thought of as a supplement to your software development education, not as a replacement, and they should be used alongside other resources. Therefore, we advise that you seek expert review and assistance. As AI tools are refined, complete novices may be able to create software products more readily on their own. AI is changing the skills required for writing code. Rather than simply memorizing syntax, developers need to focus on developing their creativity, intuition, and problem-solving skills. Additionally, programmers must also hone their abilities in architecture design and project management, as these skills are becoming increasingly important in the modern software development landscape. As AI continues to evolve, it is likely to have an even greater impact on the way we develop software in the future. As a programmer using AI, it is up to you to be responsible for what you create. You must rigorously test any code you write. It is your job to make sure any code you create with AI is not malicious and works as expected. 3.4 Tips for Coding with AI Using AI to write code is as much of an art as writing your own code. Following a few rules and suggestions will make your experience much more enjoyable and productive. Remember that AI cannot read your mind. You need to tell it what you’re thinking, and it may take several rounds of question-and-answer before you get something you can work with. Be clear about your goals. What do you want the AI to do? Be as specific as possible. Start with a simple task. Don’t try to do too much too soon. Start with a simple task and then gradually work your way up to more complex tasks. You wouldn’t write a complete program all at once, and you shouldn’t have AI do it, either. Provide enough data. The more data (and detail) you can give the AI, the better it will be able to learn and perform the task you want it to do. Be patient. Querying an AI can take time. Don’t expect it to be perfect right away. Test everything AI writes. You are responsible for the code that AI generates, even if you didn’t write it yourself. AI can be biased, just like humans, and this could lead to sub optimal development. There is a concern that AI may focus development on areas that optimize certain metrics but not overall quality, effectiveness, or user needs. Human judgment is still needed to ensure that AI is used in a responsible and ethical way. Test your results. Make sure the AI is actually doing what you want it to do. AI systems are trained on data, and if the data is not representative of the real world, the AI system may not be able to generate code that works correctly. It is important to test the code to make sure that it is working as intended and that it is producing the desired results. Inadequate testing could lead to low quality, buggy code. Since AI generated code would be highly automated, it requires strong testing procedures and safety oversight to ensure it meets standards before deployment. Make sure your code is secure. AI systems can be used to generate code that is vulnerable to attack, and it is important to test the code to make sure that it is secure. This can help to prevent security breaches from occurring. Use a variety of AI models. There are many different AI models available. Experiment with different models to see which one works best for your task. Get feedback from others. Ask other people to review your code and give you feedback. This can help you identify and fix any problems with your code. 3.5 Example One: Creating a Function One of the most common things a programmer does is write functions to automate tedious tasks. With proper guidance, AI can speed this process up. Perhaps you need to write a quick function that reverses all the letters in a string. We can query Phind for help. How do I write a function that reverses the letters in a string? To reverse a string in Bash, there are multiple approaches. One way to reverse a string is by using the rev command. According to StackOverflow, the rev command is installed on most Linux systems by default. To use the rev command, simply enter rev followed by the string you want to reverse. For example: $ echo "12345" | rev However, it is important to note that the rev command is not present in all flavors of Unix. Another way to reverse a string is by using a simple Bash script. According to StackOverflow, we can write a Bash script that reverses a string using a for loop. Here is an example script: var="12345" copy=\\${var} len=${#copy} for((i=$len-1;i>=0;i--)); do rev="$rev\\${copy:$i:1}"; done echo "var: $var, rev: $rev" This script takes the string “12345” and stores it in the variable var. It then creates a copy of the string and stores it in the variable copy. The length of the string is stored in the variable len. The for loop starts at the end of the string and iterates through each character, adding it to the variable rev. Finally, the script prints the original string and the reversed string. Another approach to reversing a string is by using the tac command. According to StackOverflow, we can use the tac command to reverse the contents of a file or the standard input. Here is an example command: $ echo "12345" | sed 's/./&\\n/g' | tac | sed -e :a -e 'N;s/\\n//g;ta' This command uses the sed command to insert a newline character after each character in the string. It then passes the output to the tac command, which reverses the order of the lines. Finally, another sed command is used to remove the newline characters and print the reversed string. Overall, there are multiple approaches to reversing a string in Bash. The rev command, a simple Bash script with a for loop, and the tac command are all viable options. Phind offers you a variety of ways to do this, but all these options are in the language Bash. Because we didn’t include a programming language in our query, Phind picked one for us. Let’s refine our query a little. How do I write a function using Python that reverses the letters in a string? To write a function using Python that reverses the letters in a string, there are several approaches that can be taken. One approach involves using a for loop and the concatenation operator to join the original string in reverse order. This technique is demonstrated in realpython.com as follows: def reverse_string(text): result = "" for char in text: result = char + result return result print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and initializes an empty string result. The for loop iterates over each character in the input string, concatenating it with the current content of result and reassigning the new intermediate string to result at each iteration. At the end of the loop, result holds a new string as a reversed copy of the original one. Another approach involves using slicing to reverse the string. This technique is demonstrated in stackoverflow.com as follows: def reverse_string(text): return text[::-1] print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and returns the string in reverse order using slicing. The syntax text [::-1] means to slice the entire string with a step of -1, which reverses the order of the string. A third approach involves using Python’s built-in reversed() function to reverse the string. This technique is demonstrated in datagy.io as follows: def reverse_string(text): return ''.join(reversed(text)) print(reverse_string("Hello, World!")) # Output: '!dlroW ,olleH' In this example, the function reverse_string takes a string as its argument and uses the reversed() function to return an iterator that produces the characters of the string in reverse order. The join() method is then used to convert the iterator back into a string. All three approaches produce the same output, but they differ in terms of performance and readability. The first approach using a for loop and concatenation is the most straightforward, but it can be slower for large strings because it creates successive intermediate strings only to throw them away in the next iteration. The second approach using slicing is more efficient because it creates a new string object only once, but it may be less readable for some programmers. The third approach using reversed() is concise and readable, but it may be slower for large strings because it creates a reversed iterator that must be converted back into a string using join(). Once again, Phind has come through with multiple options. Now let’s say you knew you wanted to write a function like this, but you realize it’s only the strings in the third column of your file. Let’s add that requirement to our query. “How do I write a function using python that reverses the letters in a string for the third column of my file?” To write a function using Python that reverses the letters in a string for the third column of a file, you can use the extended slice syntax in Python. The extended slice syntax is a more concise and faster way of slicing strings in Python. To reverse a string using extended slices, you can use the following syntax: string[::-1] In this syntax, the first colon indicates the start of the slice, the second colon indicates the end of the slice, and the third colon indicates the step size of the slice. By setting the step size to -1, you can reverse the string. To create a function that reverses the letters in a string for the third column of a file, you can use the following code: def reverse_string(string): return string[:2] + string[2:5][::-1] + string[5:] In this code, the reverse_string() function takes a string as input and returns the string with its third column reversed. The first two characters of the string are returned as-is, while the third column (characters 3 to 5) are reversed using extended slices. Finally, the remaining characters of the string are returned as-is. There are other ways to reverse a string in Python, including using loops, recursion, stacks, and function calls. However, using extended slices is the most concise and efficient way to reverse a string in Python. Here are some pros and cons of using extended slices to reverse a string: Pros: Concise syntax: using the extended slice syntax is a one-liner that is easy to read and write. Fast execution: using the extended slice syntax is faster than using loops or other methods to reverse a string. Cons: Limited functionality: the extended slice syntax can only be used to reverse a string, and cannot be used for more complex string operations. Not beginner-friendly: the extended slice syntax is not intuitive for beginners who are not familiar with Python’s slicing syntax. In conclusion, to write a function using Python that reverses the letters in a string for the third column of a file, you can use the extended slice syntax in Python. This syntax is concise and efficient, but may not be beginner-friendly and may not be suitable for more complex string operations. 3.6 Example Two: Debugging Code A vast amount of time in programming is spent on debugging code. Often the compiler error messages are very precise about what has gone wrong while simultaneously lacking enough detail for the programmer to immediately understand how to fix the error. It is good practice to debug code as you go, instead of writing the entire program and then testing to identify the errors. Using an AI chatbot makes this task much, much easier. There are two options for using ChatGPT for debugging. First, we could query what the error code means. Here’s a Python code snippet that is throwing a TypeError: unsupported operand type(s) for +: 'int' and 'str' error when we try running it. def add_numbers(a, b): result = a + b return result number1 = 10 number2 = "20" sum = add_numbers(number1, number2) print("The sum of", number1, "and", number2, "is", sum) What does this error mean? TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’ This error typically occurs when you try to concatenate (i.e., add together) a string and an integer using the “+” operator in Python. In Python, the “+” operator is used for both addition and concatenation depending on the data types of the operands. If you try to add a string and an integer using the “+” operator, Python will raise a TypeError because these two data types are not compatible for this operation. For example, if you have the following code: age = 25 message = "I am " + age + " years old." Python will raise the TypeError because you are trying to concatenate a string (“I am”) with an integer (25) using the “+” operator. To fix this error, you can convert the integer to a string using the str() function: age = 25 message = "I am " + str(age) + " years old." In this case, the str() function converts the integer age to a string before concatenating it with the other string, so the operation is valid and no error will be raised. ChatGPT helpfully explains what TypeError means, what causes it, and how to fix it. This is quite useful when you don’t understand the error message. Another option is to submit the code snippet that is causing an error to ChatGPT directly and ask for it to fix the mistake. Here’s another buggy code snippet, this time in C#. Can you debug this code? using System; class Program { static void Main() { int[] numbers = { 1, 2, 3, 4, 5 }; int sum = 0; for (int i = 1; i <= numbers.Length; i++) { sum += numbers[i]; } Console.WriteLine("The sum of the numbers is: " + sum); } } Sure, I can help you debug this code. Here are the issues I see: The loop is going out of bounds: The loop is iterating from 1 to numbers.Length, but array indexes start from 0. Therefore, when i is equal to numbers.Length, numbers[i] is trying to access an element outside the bounds of the array, which will result in an IndexOutOfRangeException. To fix this, you should change the loop condition to i < numbers.Length. The loop is not summing the array elements correctly: The loop is adding the elements starting from the second element instead of the first. To fix this, you should change the initial value of i to 0, which corresponds to the first element of the array. Here’s the corrected code: using System; class Program { static void Main() { int[] numbers = { 1, 2, 3, 4, 5 }; int sum = 0; for (int i = 0; i < numbers.Length; i++) { sum += numbers[i]; } Console.WriteLine("The sum of the numbers is: " + sum); } } This should correctly sum the elements of the numbers array and output the result. Be aware that if you choose to submit your code directly to an AI chatbot to debug that you submit a snippet at a time (if possible). Otherwise the answer can get quite unwieldy! 3.7 Example Three: Comparing Syntax You may be in the middle of writing code when you realize that you’ve used two possible syntaxes for the same command. Perhaps you are using the R package here and are wondering if two different ways of writing a command are the same. Is here::here(\"raw\", \"data.csv\") equivalent to here::here(\"raw/data.csv\")? First, let’s see what ChatGPT responded. When we first started putting together this course, we had intended to include an answer from Bard that disagreed with the answer from ChatGPT as an example of why it can be helpful to query multiple chatbots. (We decided ChatGPT was correct after verifying its answer in the here package documentation.) However, when we queried Bard again two weeks later, we were pleasantly surprised to discover that Bard had corrected its answer. Now it’s a cool little example of how the chatbots are rapidly improving and “learning” from their mistakes. 3.8 Example Four: Figuring Out a Common Task Sometimes you have a task that needs done that is fairly routine. You might not have done it before, but plenty of people in the world have. In this case, you don’t want to reinvent the wheel, but you may not know exactly where to start. AI is a great option for figuring out the basic workflow steps. A very common task for anyone working with genomic samples is to combine multiple files in the vcf format. Perhaps you’ve been asked to do this and then create a separate file that contains the names of the samples. You turn to ChatGPT for help. Can you write a program that merges vcf files and then outputs the sample names? Great! Of course, maybe you’re not comfortable using bcftools, or you try this and can’t get it to work. You can take advantage of the “Regenerate Response” option in ChatGPT to get another solution. What’s another way I can do this? You can also specify some of the features you’d like in your code, or where you might want the code to come from. Maybe you want to practice using GitHub repositories, so you specifically request it. Can you write a program that merges vcf files and then outputs the sample names using a github repo? If you know you’ve used something like a GitHub repository, you can ask ChatGPT for the citation. Just be aware that you need to verify that any citation it gives you should be checked independently. Unfortunately, AI bots can still respond with imaginary journal articles. (Assume that if you can’t find it, it doesn’t exist.) What’s the citation for the repo used in this code? In general, after you find a solution that works for you, you should always make a habit of asking the AI where it got the code. Never, ever, ever run code without knowing where it came from. What’s the source for this code? You should always take care that you are only using code that you’re authorized to use when writing code. This means asking the AI chatbot where code snippets are from and tracking down those citations and authorizations yourself (instead of just trusting the AI that everything is okay). 3.9 Example Five: Planning Your Program Let’s say we want to create software that plays the card game blackjack. You can jump right in and ask your AI chatbot to write the code, but you’re unlikely to be successful and incredibly likely to get frustrated. Before you start interacting with the AI, you need to plan out what you want to do. Everyone needs a plan of attack when building software. You should start by identifying what your software will do, then you can figure out the steps necessary to make that happen. We’ll ask Claude-instant (as implemented by Poe) how we can go about creating the necessary code for our game. How would I go about writing a program that can play blackjack? That seems like a really good roadmap for building a basic game, but if you’re ambitious (and of course you are), you might want to give your game a little extra something. Claude-instant has some ideas for that, too. What are some other features I could add to this game? At this point, you are ready to begin writing your program. You have a lot of tasks to do, so you have to pick a logical starting point. If you’re uncertain, you can ask AI for advice on your first task. What should be my first coding task for this program? In this case, the AI went beyond just suggesting a first task and also offered several options for how to do it as well. 3.10 Example Six: Selecting a Coding Language You will need to choose the language for your software. If you don’t, AI will likely pick a language for you, which may or may not be one you want. You might choose a language based on: Your preexisting knowledge or experience with the language A specific use case where the language is commonly used The language(s) your colleagues or coworkers use You can find a description of other languages you might encounter here. Let’s ask Claude-instant again for some recommendations on the appropriate languages for our proposed blackjack game from the above example. (Pro tip: the various AI bots all keep track of your previous queries, so you don’t have to restate your goal for the software if you’re continuing the conversation.) What would be a good coding language to create this game of blackjack? Claude-instant actually gave us more information than we anticipated. If it hadn’t stated pros and cons while offering language suggestions, we could ask for them directly. This could be helpful if you’re deciding between two options like, say, Python or Java. What are the pros and cons of using Python vs Java for this project? 3.11 Hands On Exercise Now it’s your turn to try! 3.11.1 Code Here’s the code you need for question #2. def calculate_average(numbers): total = 0 for number in numbers: total += number average = total / len(numbers) return average numbers = [1, 2, 3, 4, "5"] average = calculate_average(numbers) print("The average of the numbers is:", average) 3.11.2 Questions Write a prompt to create a function that replaces all instances of the string “three” with the number 3, then create a prompt that allows you to modify the function so that “three”, “Three”, and “THREE” are all replaced with the number 3. Write a query that helps you debug the code snippet from above. Create a query that explains how to retrieve data from an online database. Don’t forget to find out what the source of the code is. Write a prompt to plan out a program that creates and plays a game of Rock, Paper, Scissors. You should create a multi-step plan as well as figure out what your first coding task should be. Create a prompt that identifies the best coding language for the above game. 3.12 Summary Coding has become an essential skill in today’s world. AI can increase the efficiency of planning, writing, and debugging code, as well as make coding more accessible for beginners. Querying AI on code writing tasks requires patience. AI cannot read your mind, and users need to be both specific and patient when asking for coding help. It is essential to test all AI-written code while in development, since inadequate testing could lead to low quality, buggy code. A programmer is responsible for what they create while coding with AI, even if AI writes the actual code. devtools::session_info() ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## curl 4.3 2019-12-02 [1] RSPM (R 4.0.3) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## highr 0.8 2019-03-20 [1] RSPM (R 4.0.3) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.3) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## ottrpal 1.0.1 2023-03-28 [1] Github (jhudsl/ottrpal@151e412) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["refactoring-code.html", "Chapter 4 Refactoring Code 4.1 Learning Objectives 4.2 Refactoring Basics 4.3 AI in Code Refactoring 4.4 Example One: Correcting Syntax 4.5 Example Two: Styling and Convention 4.6 Example Three: Staying DRY 4.7 Example Four: Keeping Code Alive 4.8 Example Five: Making Code Readable 4.9 Example Six: Keep it Brief 4.10 Code optimization 4.11 Challenges and Limitations 4.12 Hands-On Exercise 4.13 Summary", " Chapter 4 Refactoring Code 4.1 Learning Objectives Describe how refactoring code involves optimization for maintainability, efficiency, and reuse Explain why refactoring code is important for developers in the long-term Recognize the benefits and limitations of using AI tools to refactor code, as well as why AI tools are uniquely poised to be beneficial Implement prompt strategies that can be used to assist with refactoring code for correcting syntax, for adopting more consistent styling, for making code more concise, for making code easier to maintain, and for making code more efficient 4.2 Refactoring Basics Code refactoring is the process of improving the quality of underlying code without changing its functionality. In other words, it’s a way of cleaning up and optimizing code so that it’s easier to maintain and more efficient. This often involves making small changes to the code, such as renaming variables or functions, reorganizing code blocks, or simplifying complex expressions. Refactoring is an essential practice in software development and helps to ensure that the codebase remains manageable and adaptable as requirements and business needs change over time. Code refactoring helps to reduce technical debt, which is the accumulation of development work that needs to be done in the future as a result of taking shortcuts or using less than optimal solutions. Refactoring can help to identify potential issues and bugs before they cause problems, resulting in more stable software. Finally, refactoring code to be more readable and less verbose can create fewer headaches for you in the future. To understand why code refactoring is important, imagine trying to cook in a disorganized and cluttered kitchen. You might be a fantastic cook, but you are spending more time searching for utensils and ingredients, and the quality of the meal could suffer as a result. Similarly, code that is disorganized and cluttered can be difficult to work with and can result in lower-quality software. Refactoring is like cleaning and organizing the kitchen - it makes the process of cooking (or coding) smoother and more efficient. 4.3 AI in Code Refactoring Code refactoring has historically been done manually by developers. This involves reviewing code and identifying areas that could be improved or optimized, and then making changes to the codebase accordingly. Though important, this is process is time-consuming and labor-intensive, as it requires developers to carefully review every line of code to identify potential issues or areas for improvement. Additionally, manual code refactoring is error-prone, as developers can accidentally introduce bugs or errors into the codebase while making changes. However, AI has significant potential to help with code refactoring. AI can use machine learning algorithms to analyze large amounts of code and identify patterns or areas that could be improved. For example, they can identify sections of code that are redundant, overly complex, or difficult to maintain, and suggest changes that could be made to improve the codebase. Machine learning algorithms can also help to identify potential bugs or security issues in the codebase, which can help to improve the overall quality and stability of the software. AI refactoring is also faster and more accurate than manual refactoring. This is particularly useful for large-scale software projects with massive codebases, where manual code review and refactoring can be an enormous task. In the next sections, we’ll take a look at some examples of using AI to refactor code. The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. 4.4 Example One: Correcting Syntax Programming languages operate using their established set of rules. For example, Python uses indentation to define code blocks. However, when you shift between development environments, indentation and line breaks can sometimes be affected, causing major headaches! Let’s say we run the following code: def calculate_sum(a,b): result = a + b return result def calculate_product(a,b):result = a * b return result def main(): a=5 b=6 sum_result = calculate_sum(a, b) print("Sum:", sum_result) product_result = calculate_product(a,b) print("Product:", product_result) if __name__ == '__main__': main() ## unexpected indent (<string>, line 3) We get the error above because our indentation is off. Let’s ask ChatGPT to clean things up. Can you assist me with correcting the formatting and indentation issues in my Python code? Of course! Here is the code after correcting the formatting and indentation: def calculate_sum(a, b): result = a + b return result def calculate_product(a, b): result = a * b return result def main(): a = 5 b = 6 sum_result = calculate_sum(a, b) print("Sum:", sum_result) product_result = calculate_product(a, b) print("Product:", product_result) if __name__ == '__main__': main() ## Sum: 11 ## Product: 30 Much better! This can be quite handy when you can’t immediately see what is wrong with the formatting or if you are working with a large codebase. 4.5 Example Two: Styling and Convention Sometimes programming languages have more flexible rules where there are multiple ways to take the same or almost the same action. For example, in the R programming language, there are several ways to perform variable assignment. However, is is generally accepted among R programmers that the <- operator should be used. Single and double quotes can be used for character types interchangeably, but it’s generally a good idea to stick with one or the other. 4.5.1 Commonly Accepted Convention Let’s say we have the following code: a <- "apple" b = 'banana' "carrot" -> c c(a, b, c) ## [1] "apple" "banana" "carrot" This code does not produce any errors, but it doesn’t follow typical R convention. Let’s ask ChatGPT to clean things up. Can you help me reformat my R code to follow accepted style conventions? Sure, here’s the cleaned-up R code: a <- "apple" b <- "banana" c <- "carrot" Much better! This code will now look more familiar to other R programmers. 4.5.2 Visual Styling Some languages are flexible about line breaks, indentation, and spacing among code blocks. If you’re rapidly developing, things can get messy pretty quickly. Many development environments, for example RStudio and PyCharm, have the ability to format your code for you without changing any of the code itself. However, you can also get AI to do this according to your own preferences. d <- c(rep(a,5),rep(b,5),rep(c,5),paste(a,b,c),paste(c,b,a)) This code is pretty hard to read. Let’s see if AI can do better. Can you space out this R code to make it more readable? Sure! Here’s the same code with added spacing for improved readability: d <- c(rep(a, 5), rep(b, 5), rep(c, 5), paste(a, b, c), paste(c, b, a)) Better! It’s easier to interpret because only one action is being taken on each step. By spacing out the code and breaking it into multiple lines, it becomes easier for the human eye to scan for structure, logic, errors, and/or inconsistencies. 4.6 Example Three: Staying DRY The DRY (Don’t Repeat Yourself) programming tenet is essential to good software. By avoiding repetition in code, we can reduce the amount of duplicate code we need to write and maintain. If we need to change the functionality of a certain piece of code, we have to make the same change in multiple places, which can be time-consuming and error-prone. Let’s take the following Python code: def print_greetings(): print("Hello, Alice!") print("Hello, Bob!") print("Hello, Carol!") print("Hello, David!") print("Hello, Eve!") print_greetings() ## Hello, Alice! ## Hello, Bob! ## Hello, Carol! ## Hello, David! ## Hello, Eve! Notice how repetitive this code is? print() and the string \"Hello, \" appear five times. We can improve this! Could you please refactor my Python code to make it more DRY? Sure, here’s a refactored version of your code: def print_greeting(name): print(f"Hello, {name}!") def print_greetings(): names = ["Alice", "Bob", "Carol", "David", "Eve"] for name in names: print_greeting(name) print_greetings() ## Hello, Alice! ## Hello, Bob! ## Hello, Carol! ## Hello, David! ## Hello, Eve! Now let’s imagine your software lead has asked us to change the greeting to Spanish. Notice how we only have to change the second line of code: def print_greeting(name): print(f"¡Hola, {name}!") def print_greetings(): names = ["Alice", "Bob", "Carol", "David", "Eve"] for name in names: print_greeting(name) print_greetings() ## ¡Hola, Alice! ## ¡Hola, Bob! ## ¡Hola, Carol! ## ¡Hola, David! ## ¡Hola, Eve! Much easier. 4.7 Example Four: Keeping Code Alive During software development, code is often changing rapidly. It’s common to end up with functions or variables you don’t need anymore. It’s best practice to remove “dead” code from your codebase: it’s distracting and can make maintenance more challenging. Programmers should resist the urge to hang on to unused code in the expectation that they might need it one day. Version control, such as provided by git and GitHub, let you keep a record of old code without cluttering your final version. Let’s take a look at some Python code with some dead code: def default_values(): a = 5 b = 6 return a, b def calculate_sum(a, b): result = a + b return result def calculate_product(x, y): return x * y def simulate_data(): a, b = default_values() dat = calculate_sum(a, b) + calculate_sum(a, b) return dat def main(): a = 5 b = 6 a, b = default_values() sum_result = calculate_sum(a, b) print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 Some of these functions aren’t used in the final step, typically defined in python as the main() function. Let’s ask AI to remove them. I need help with eliminating unused functions in my Python script. Sure! Here’s a refactored version: def default_values(): a = 5 b = 6 return a, b def calculate_sum(a, b): result = a + b return result def main(): a = 5 b = 6 a, b = default_values() sum_result = calculate_sum(a, b) print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 Great! Now our code is much more manageable. This code only includes the necessary functions needed to run the main() function. We can go a step further! Can you help me refactor my python code to make it more concise? Certainly! Here’s a more concise version of your code: def calculate_sum(a=5, b=6): return a + b def main(): sum_result = calculate_sum() print("Sum:", sum_result) if __name__ == '__main__': main() ## Sum: 11 In this version, the calculate_sum function is simplified to take default arguments for a and b. This means the default_values function is no longer necessary. The a and b variables are also removed from main, as they are no longer needed. Much better and easier to look at overall. 4.8 Example Five: Making Code Readable Usually, our first goal as programmers is to get the response we need from the code. Readability might be the last thing on our minds. Luckily, AI can help with this. Let’s imagine we whipped up this code pretty quickly. It is tough to read, and it could definitely be improved. for(term in c("good","bad","ugly")){ print(c(rep(term,3),paste("really",stringr::str_to_upper(term)))) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" Let’s see if AI can do better. Can you help me clean up this R code so it is more readable? Sure! Here’s a more readable version of the code: library(stringr) for(term in c("good", "bad", "ugly")) { print(c(rep(term, 3), paste("really", str_to_upper(term)))) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" I added whitespace around operators and after commas for better readability. I also loaded the stringr package explicitly using library() to make the code more self-contained. This is a good start, but not much has changed. Let’s try again. On ChatGPT, we can click “Regenerate Response”. Here’s a cleaned-up version of the R code that is more readable: library(stringr) terms <- c("good", "bad", "ugly") for (term in terms) { rep_term <- rep(term, 3) upper_term <- str_to_upper(term) really_upper_term <- paste("really", upper_term) result <- c(rep_term, really_upper_term) print(result) } ## [1] "good" "good" "good" "really GOOD" ## [1] "bad" "bad" "bad" "really BAD" ## [1] "ugly" "ugly" "ugly" "really UGLY" I’ve stored the repeated term and the uppercased term in separate variables to make the code more readable. I’ve also separated the construction of the “really” phrase from the uppercased term to make it clearer what’s happening. Finally, I’ve used whitespace and indentation to make the structure of the code more apparent. Better! While the code takes up a greater number of lines, it’s easier to interpret because only one action is being taken on each step. 4.9 Example Six: Keep it Brief As we learned above, following the DRY principle and removing dead code can make your code easier to interpret, maintain, and debug. However, sometimes the issue is the functions themselves. No one has perfect knowledge of everything in their programming language. AI can help us determine if there are any more efficient sets of functions that do the same thing. Here is some code that plays with the “CO2” dataset in R: library(datasets) for (type in unique(CO2$Type)) { dat_subset <- CO2[CO2$Type == type, ] for (plant in unique(dat_subset$Plant)) { dat_subset_2 <- dat_subset[dat_subset$Plant == plant, ] print(paste(dat_subset_2$Plant[1], round(mean(dat_subset_2$uptake), 1))) } } ## [1] "Qn1 33.2" ## [1] "Qn2 35.2" ## [1] "Qn3 37.6" ## [1] "Qc1 30" ## [1] "Qc2 32.7" ## [1] "Qc3 32.6" ## [1] "Mn1 26.4" ## [1] "Mn2 27.3" ## [1] "Mn3 24.1" ## [1] "Mc1 18" ## [1] "Mc2 12.1" ## [1] "Mc3 17.3" Let’s see if AI has any ideas for making it more concise. Can you make this code more concise? Here’s a more concise version using dplyr package in R: library(dplyr) CO2 %>% group_by(Type, Plant) %>% summarize(mean_uptake = mean(uptake)) ## # A tibble: 12 × 3 ## # Groups: Type [2] ## Type Plant mean_uptake ## <fct> <ord> <dbl> ## 1 Quebec Qn1 33.2 ## 2 Quebec Qn2 35.2 ## 3 Quebec Qn3 37.6 ## 4 Quebec Qc1 30.0 ## 5 Quebec Qc3 32.6 ## 6 Quebec Qc2 32.7 ## 7 Mississippi Mn3 24.1 ## 8 Mississippi Mn2 27.3 ## 9 Mississippi Mn1 26.4 ## 10 Mississippi Mc2 12.1 ## 11 Mississippi Mc3 17.3 ## 12 Mississippi Mc1 18 This code groups the CO2 dataset by Type and Plant, calculates the mean uptake for each group using summarize, and then prints the resulting data frame. The code is shorter and more concise because it uses the dplyr package, which has built-in functions for data manipulation and summarization. Refactoring the code to be more concise using dplyr makes it easier to read and understand the steps involved in the data manipulation. It also reduces the number of nested loops and intermediate variables, making the code more efficient and reducing the chance of errors. Ultimately, it is your job as a programmer to use your best judgment on the fine balance between brevity and readability, as the best solution is not always this obvious. Remember that you can regenerate the AI’s response if you want more options. 4.10 Code optimization Sometimes time and efficiency is of the essence. Loops can be slow and time consuming in many languages. Take for example the following R code that calculates the cross product between two vectors x and y using nested for loops: set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) z <- numeric(n) # Start timer start_time <- proc.time() for(i in 1:n) { for(j in 1:n) { z[i] <- z[i] + x[i]*y[j] } } # End timer proc.time() - start_time ## user system elapsed ## 210.361 0.068 210.452 What is a faster, more optimized way of running the following R code? set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) # Start timer start_time <- proc.time() z <- outer(x, y, "*") z <- rowSums(z) # End timer proc.time() - start_time ## user system elapsed ## 0.771 0.244 0.746 The outer() function performs the same calculation as the nested loop in the original code, but more efficiently. It returns a matrix of all possible combinations of x and y values, with each element of the matrix being the product of the corresponding x and y values. The rowSums() function is then used to sum the elements of each row of the matrix, which is equivalent to summing the products of x and y for each index i in the original loop. This method avoids the need for the nested loop, resulting in a faster and more efficient computation. So much faster! We can look at another option by regenerating the response: set.seed(123) n <- 10000 x <- rnorm(n) y <- rnorm(n) z <- numeric(n) # Start timer start_time <- proc.time() z <- x %*% t(y) # End timer proc.time() - start_time ## user system elapsed ## 0.334 0.248 0.357 One optimized way to perform the same calculation is by using the %*% operator to perform matrix multiplication. This can be done by converting x and y to matrices and transposing one of them so that their dimensions align for matrix multiplication. This code should be much faster than the original implementation because it takes advantage of highly optimized matrix multiplication algorithms in R. While this second suggestion is faster, you will need to consider what aspects of the codebase are most important in each instance. For example, this code runs more quickly, but the %*% operator might be unfamiliar to some R programmers. In cases where efficiency is less important, or the data are not large, you might consider maximizing readability. 4.11 Challenges and Limitations Like humans, AI can make mistakes. Be sure to keep the following in mind as you use AI to refactor your code. 4.11.1 Complexity Refactoring is often a complex process that requires deep understanding of the code and its context. AI may not always be able to fully understand the complexity of the code and may struggle to identify the best refactoring strategy. Use a modular approach whenever possible. 4.11.2 Limited Data AI models require large amounts of data to learn from, but in the case of code refactoring, there is often limited data available. This can make it difficult for AI models to generalize to new code bases and situations, especially if you are using a more niche programming language. 4.11.3 Quality Control Automated refactoring tools that use AI may not always produce code that is of the same quality as code produced by human developers. It can be difficult to always ensure that the refactored code is maintainable, efficient, and free of bugs. You need to use your best judgment when copying and pasting AI-produced code into your codebase. You should always include unit tests in your code. Tests can help you catch bugs, including those introduced accidentally by AI. Because AI models are created by humans, they can be biased. This means they may not always identify your preferred refactorings or may prioritize certain types of refactorings over others. In some cases, this can lead to suboptimal code quality and may create technical debt over time. 4.11.4 Security When using AI to refactor code, the code itself is often sent to an external service or platform for analysis and transformation. This can raise concerns about the security of the code, especially if it contains sensitive information such as trade secrets, proprietary algorithms, or personal data. If your code is sensitive, it’s important to carefully vet any third-party AI tools or services used in the refactoring process. 4.12 Hands-On Exercise Now it’s your turn to try. 4.12.1 The Code Let’s say you are dusting off some code from your past (no judgment here). You were investigating tweets about Mr. Trash Wheel, a beloved Baltimore-based contraption that filters trash out of the waterways. Note: This code is just an example and was written strictly for educational purposes. import tweepy import pandas # Enter your API keys and access tokens here consumer_key = 'your_consumer_key' consumer_secret = "your_consumer_secret" access_token = 'your_access_token' access_token_secret = 'your_access_token_secret' # Authenticate with Twitter API auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q="Mr. Trash Wheel", tweet_mode='extended').items(100) # Create empty list to store tweet data date_data = [] location_data = [] text_data = [] def get_tweet_length(tweet): # Return the length of the tweet text return len(tweet.full_text) # Loop through each tweet and extract desired data for tweet in tweets: date_info = { 'date': tweet.created_at } date_data.append(date_info) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q="Mr. Trash Wheel", tweet_mode='extended').items(100) # Loop through each tweet and extract desired data for tweet in tweets: location_info = { 'location': tweet.user.location } location_data.append(location_info) # Search for tweets containing the search term tweets = tweepy.Cursor(tweepy.API(auth).search_tweets, q='Mr. Trash Wheel', tweet_mode='extended').items(100) # Loop through each tweet and extract desired data for tweet in tweets: text_info = { 'text': tweet.full_text } text_data.append(text_info) # Combine lists into a dictionary data = {'date': date_data, 'location': location_data, 'text': text_data} # Store results in pandas dataframe df = pandas.DataFrame(data) # Print dataframe print(df) 4.12.2 Questions Create an AI prompt that fixes any formatting issues with the code that would cause it not to run. Devise an AI prompt that removes any dead code from your sample above. What gets removed? Create a prompt that makes the code less repetitive, adhering to the DRY principle. What aspect of the code was repetitive? Construct a prompt that makes the code more concise. What are some trade-offs that appear in this code between readability and brevity? 4.13 Summary Code refactoring is the process of improving code quality without changing its functionality. It is crucial in software development to maintain a manageable and adaptable codebase. Code refactoring reduces technical debt, improves code stability, and makes it easier to maintain. Examples of using AI for code refactoring include correcting syntax, adhering to styling and convention, visual styling, avoiding repetition, removing dead code, and improving both readability and speed of execution (optimization). The use of AI for code refactoring raises ethical concerns and is not perfect. It is important for the developer to consider security needs of their code, as well as test out their code. "],["annotating-your-code.html", "Chapter 5 Annotating Your Code 5.1 Learning Objectives 5.2 Annotating Code is Good Practice 5.3 Improves readability 5.4 Improves maintainability 5.5 Improves the quality 5.6 Examples of quality aiding comments 5.7 AI in Code Annotation 5.8 Ethics of using AI to annotate code 5.9 Examples on using AI for annotation 5.10 Summary", " Chapter 5 Annotating Your Code 5.1 Learning Objectives Explain the significance and benefits of code annotation and how it aids in understanding and working with code in the future. Demonstrate the ability to apply effective code annotation practices by providing clear and concise comments in code. Recognize the impact of code annotation on code maintainability. Explain how well-annotated code can facilitate collaboration among developers and ensure code consistency. Critically evaluate the use of AI in code annotation, including benefits, ethical considerations, and limitations. Explore how AI can be used to create README files, do line-by-line annotations, and offer potential code improvements. 5.2 Annotating Code is Good Practice Code annotation, also known as code commenting, is the process of adding explanatory notes to source code. These notes are used to provide context, clarify functionality, and aid in understanding for developers who may be working on the codebase in the future. Code annotation is an important practice for any developer looking to write clear, efficient, and maintainable code. There are many benefits to annotating code: Improves readability Improves maintainability Improves quality The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. 5.3 Improves readability First, it can help to improve the readability of code. By adding comments, developers can explain the purpose of each section of code, which can make it easier for others to understand what the code is doing. This can be especially helpful for large or complex codebases, where it can be difficult to keep track of all the different components. This is particularly important in contexts where the original developer of a project may move on to something else and others are left to work on them. Annotation perhaps most often helps a developer remember things they knew about the code when they were originally writing it (annotation is helpful for future you!) If the original developer left well-annotated code, it can drastically improve the ability of others (and their future selves) to continue with the project. 5.3.1 Examples of readability aiding comments Comments that help readability clarify what the code is doing Function explanations: A comment at the beginning of a function or method can describe its purpose, parameters, and expected return values. This makes it easier for others to understand what the function does and how to use it. # This function calculates and returns the sum of two numbers (x and y) def add_numbers(x, y): return x + y Inline explanations: Use inline comments to explain what a specific line or block of code does. This can help someone reading your code to quickly understand what’s going on. # Loop through each item in the list for item in my_list: # Check if the item is greater than 10 if item > 10: # Print the item print(item) TODO comments: Use TODO comments to mark areas of your code that need further work or improvement. This can help you or others to remember to come back to a specific part of the code. # TODO: Implement error handling for this function def my_function(): pass Parameter descriptions: If a function or method has complex parameters, it can be helpful to add comments explaining what each parameter does. def my_function(parameter1, parameter2): """ Calculate the sum of two numbers. Parameters: parameter1 (int): The first number to be added. parameter2 (int): The second number to be added. Returns: int: The sum of parameter1 and parameter2. """ return parameter1 + parameter2 Code block summary explanations: If you have a long or complex code block, you can add a comment to explain what the block is doing. # This code block creates a dictionary containing the counts of each word in a list of text strings word_counts = {} for text in text_list: for word in text.split(): if word not in word_counts: word_counts[word] = 0 word_counts[word] += 1 5.4 Improves maintainability Code annotation can help to improve the maintainability of code. By adding comments, developers can explain the reasoning behind certain decisions, which can make it easier for others to make changes to the code without breaking it. This can be especially helpful when multiple developers are working on the same codebase, as it can help to prevent conflicts and ensure that the code is always in a consistent state. 5.4.1 Examples of maintainability aiding comments Comments that help maintainability explain the historical context of why code was made the way it was Examples of maintainability aiding comments Design decisions: If there were specific design decisions made when creating the code, you can add comments explaining why certain choices were made. bash # We chose to use a linked list data structure for this function to reduce the time complexity of inserting and deleting elements. Legacy code: Sometimes, code may have been written in a certain way due to constraints or limitations at the time it was created. Adding comments to explain this can help others understand why the code is the way it is. # This code was written before Python 3, which introduced the `yield from` syntax. Therefore, we used a `for` loop to iterate over the nested list. Performance optimizations: If certain performance optimizations were made to the code, you can add comments explaining why they were necessary. bash # We used memoization to improve the time complexity of this recursive function, as it was taking too long to execute for larger inputs. Compatibility considerations: If the code was written with compatibility considerations in mind, you can add comments explaining why certain choices were made. # We used the `os.path` module to ensure that this code will work on both Windows and Unix-based systems, as the path separators are different on each platform. Limitations: If there are limitations or edge cases that the code cannot handle, you can add comments to explain this to others. # Note that this function assumes that the input array is sorted in ascending order. If the array is unsorted, the results may be incorrect. 5.5 Improves the quality Code annotation can help to improve the quality of code. By adding comments, developers can identify potential issues or edge cases, which can help to prevent these issues from occurring in the first place. This can be especially helpful when testing code, as it can help to ensure that the code is working as expected. 5.6 Examples of quality aiding comments Comments that help improve the quality of the code by explaining to others how to use it or help improve it Error handling: Adding comments to explain how and why error handling is being implemented can help ensure that your code is robust and able to handle unexpected inputs or errors. This helps others know how the code was intended to be used. # If the input argument is not a list or is empty, raise a ValueError if not isinstance(input_list, list) or len(input_list) == 0: raise ValueError("Input must be a list that is not empty") Complexity: If your code has particular complexities, adding comments that explain it can help others understand the performance characteristics of your code. It may help others identify whether there is a simpler way to write the code. By adding a comment that expresses uncertainty about the code and asking for suggestions, the author can potentially receive feedback from others on how to improve the code. # This block of code could probably be simplified, but I'm not sure how. new_list = [] for i in old_list: if i > 0: new_list.append(i) Constants and variables: Adding comments to explain the purpose of constants and variables can make your code easier to use. It not only notifies others of the variables existence but lets them know if they need to change the parameters for their own purposes. # This constant represents the maximum allowed number of retries when attempting to connect to the server. MAX_RETRIES = 3 # This variable tracks the number of failed attempts to connect to the server. num_retries = 0 Code organization or disorganization: Adding comments to explain the organization and structure of your code can make it easier for others to navigate and may bring out a better organizational strategy for your code. # Define helper functions for data processing def preprocess_data(input_data): # Implementation details def analyze_data(processed_data): # Implementation details def visualize_data(analyzed_data): # Implementation details # Load data from file input_data = load_data("input.txt") # Preprocess data processed_data = preprocess_data(input_data) # Analyze data analyzed_data = analyze_data(processed_data) # Visualize data visualize_data(analyzed_data) Assumptions and constraints: Adding comments to explain the assumptions and constraints of your code can help others understand its limitations and potential pitfalls. # This function assumes that all input values are positive integers. Negative or non-integer inputs may result in unexpected behavior. 5.7 AI in Code Annotation We’ve discussed why code annotation and documentation are important, but when and why would you use AI for code annotation? AI can be a nice tool to supplement the annotation of your code. It should not be the only source of annotation for your code, because as the code author, you need to verify that what AI has annotated is consistent with your knowledge and history of development of the code. But using AI to annotate your code can be useful to supplement existing code annotations or to annotate old code that was poorly annotated either by yourself or others who are no longer working on the project. Here are some of the benefits of using AI for code annotation: Speed and efficiency: AI algorithms can analyze code much faster than humans, which means they can quickly generate comments and annotations for large codebases. This can save developers significant time and effort, allowing them to focus on other aspects of the development process. Consistency: Unlike humans, AI is not affected by personal biases or preferences, so it can provide consistent annotations across different code files and projects. A human may underestimate places in the code that should have annotation, whereas an AI might be more consistent at putting annotation in these places. This can help ensure that all code in a project is well-documented and easy to understand. Objectivity: AI can analyze code objectively and identify potential issues that may have been missed by humans. This can lead to better quality code that is easier to maintain and less prone to errors. Learning: AI models can learn from large datasets of annotated code and improve their ability to generate comments and annotations over time. This means that the more code the AI model analyzes, the more accurate and effective it becomes at annotating code. Specificity: AI models can be trained on specific programming languages, frameworks, or libraries, allowing them to generate language-specific comments and annotations that are tailored to the needs of the project. This can be particularly helpful for large, complex projects that require specialized knowledge or expertise. 5.8 Ethics of using AI to annotate code However, there are also a number of potential ethical concerns associated with using AI to annotate code. For example, we don’t yet know how and in what ways AI models may be biased. Additionally, AI models may be opaque, which could make it difficult to understand why they made certain annotations. Finally, because AI models are not humans and don’t necessarily tell the stories behind the code, they may be used to generate annotations that are not accurate, helpful, or do not tell the full depth of the history of what occurred with the code, which could lead to problems. Given these potential benefits and concerns, it is important for users of AI to always realize that they are ultimately accountable for the annotation that an AI model makes, and careful review of this annotation is needed. There are a number of ethical considerations to take into account when using AI to annotate code. Here are a few of the most important considerations: Accountability: First and foremost, a user of AI is always primarily accountable for any output that they use from an AI model. AI models can give you annotation to start with, but it is up to you as the user to verify and review this output carefully. The user of the AI model is ultimately responsible for keeping or throwing out the annotations the AI makes and thus is responsible for using errors that the model makes. Much like a user of a Google Search engine is responsible for which results they use, a user of AI is responsible what output from the AI model they use. Transparency: It is important to be transparent about the use of AI in code annotation. This means disclosing the fact that AI is being used, as well as the specific AI model and dataset that is being used. This should be stated on every file where annotation has been made using AI. This transparency allows others who view the code to be able more fully interpret the AI-created annotation that accompanies the code. Bias: AI models are trained on data, and this data can introduce bias into the model. It is important to be aware of the potential for bias and to take steps to mitigate it. This can be done by using a diverse dataset, by carefully selecting the features that are used to train the model, and by using techniques such as adversarial training. It is important to provide annotations that indicate any known biases, possible limitations associated with bias, and any strategies that were used to mitigate bias. Explainability: It is important to be able to explain the decisions that an AI model makes. This is especially important in the case of code annotation, where the decisions made by the AI model can have a significant impact on the quality of the code. There are a number of techniques that can be used to explain the decisions of an AI model, such as visualization and rule extraction. You can (and should) ask a chatbot to explain its sources and rationale for output that it gave. (Be aware that actual citations it gives may or may not be accurate, and you need to verify the accuracy of those citations by doing your own follow up literature search). 5.9 Examples on using AI for annotation 5.9.1 Ask for a README summary To try out some annotation with code examples, this author dug up some old code that she had no idea about what it did anymore because time had passed and she didn’t document it very well. First, I wanted ChatGPT to make a README for me. README’s are a standard file that explain what the code base does and provides a summary of what is happening in the set. I asked ChatGPT if it could: Create a README for this code and then I pasted the old code. ChatGPT gave me some output that generally looked like a README: This README seems accurate given the work I was doing at the time. But obviously the title isn’t great – it doesn’t describe the code well. Plus there’s not really information about usage or outputs of the code. So I asked it: Can you make this README more succinct and with sections about how you would use this code? This yielded a much better result. 5.9.2 Ask for line by line annotation Second, I wanted to have ChatGPT annotate code line by line. So I asked it Can you annotate some R code for me? It not only provided helpful annotations but it fixed spacing and added more spaces between lines of code: It does however, think the code is Python apparently if we look in the upper left corner. But this doesn’t seem to have affected its accuracy in its annotations. 5.9.3 Ask for how the code could be improved This code is old, and I don’t think it runs properly and is in no way reproducible. So I wanted to see what ChatGPT would recommend for improvements to the code. I said How could I make this code better? It had a lot of great advice: First and foremost it told me my variable names are not consistent and also not descriptive. Which is completely accurate! There’s variable names like xx and ll which don’t tell the reader anything about the variable. 5.9.4 Style the code Next, I asked chatGPT if it could restyle the code into tidyverse style. It definitely put elements that look like tidyverse however, it also put items in the code that I can tell by looking at it won’t work. So I tried to tell it about the errors: :::{.query} This code is close but there are some errors. These tidyverse functions don’t work with matrices. ::: and it told me: You are correct that the tidyverse functions are designed to work with data frames and not matrices. Here's a modified version of your code that converts the matrix to a data frame and uses tidyverse functions: The lesson here is that the output of chatGPT still needs to be vetted by the person asking for it. It is a great idea to continue to have AI’s work on something if it isn’t quite what you are asking for, however, at the end of the day it is you, the human, who has to vet the output. 5.9.5 Sometimes it has trouble with file types if given a URL Copying and pasting code appears to be the best way to go at this point in time. I asked chatGPT if it could annotate code in a Jupyter notebook at a URL I gave it: Can you annotate the code at this URL? https://github.com/jhudsl/reproducible-python-example/blob/main/make_heatmap.ipynb It responded with some of the code at that file where it was annotated but most of the notebook was not there. I told ChatGPT: That isn’t all the code at that url ChatGPT told me: You're right, I apologize. The code at the given URL is actually in a Jupyter Notebook format, so it's not easy to copy-paste here. However, I can suggest an alternative approach: you can use the website nbviewer (https://nbviewer.jupyter.org/) to view the notebook and its code. Here are the steps to do so: Go to https://nbviewer.jupyter.org/ Paste the URL of the notebook https://github.com/jhudsl/reproducible-python-example/blob/main/make_heatmap.ipynb into the "Enter the location of a Jupyter Notebook" field and press "Go". This will display the notebook and its code, and you can read through it and add your own annotations as needed. I hope this helps! This was a nice tool to recommend, however it still told me I should add my own annotations at the end of this. 5.10 Summary Code annotation is important for readability, maintainability, and overall quality of code AI can be a helpful tool for supplementing annotations or creating annotation for poorly annotated code. Ultimately the code author who uses AI as a tool for annotation is still responsible for verifying that the annotations are consistent with their knowledge of the code. devtools::session_info() ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## curl 4.3 2019-12-02 [1] RSPM (R 4.0.3) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## highr 0.8 2019-03-20 [1] RSPM (R 4.0.3) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.3) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## ottrpal 1.0.1 2023-03-28 [1] Github (jhudsl/ottrpal@151e412) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["understanding-unfamiliar-code.html", "Chapter 6 Understanding Unfamiliar Code 6.1 Learning Objectives 6.2 Reading Unfamiliar Code Is A Skill 6.3 AI in Reading Code 6.4 Example One: Summarizing Code You Didn’t Write 6.5 Example Two: Identifying a Coding Language 6.6 Example Three: Interpreting Regex 6.7 Example Four: Demystifying Functions 6.8 Limitations 6.9 Hands-On Exercise 6.10 Summary", " Chapter 6 Understanding Unfamiliar Code 6.1 Learning Objectives Recognize the benefits, limitations, and assumptions in using AI to understand unfamiliar code Consider the ethical questions around using AI to understand another person’s code Practice using AI to summarize code, identify an unknown coding language, and interpret regex and unknown functions 6.2 Reading Unfamiliar Code Is A Skill As a programmer, you will frequently encounter situations where you need to read and understand code written by other developers. This could be because you are collaborating on a project, you have taken over maintenance of an existing codebase, or you are learning new syntax or skills that require reading example codes. Comprehending code written by someone else is a distinct skill from being able to write your own code, and it takes practice to become proficient at it. Reading unfamiliar code is like exploring an unfamiliar city without a map or a guide. Just like in a new city, you may not know where to start or how to navigate the codebase. You may encounter unfamiliar syntax, functions, and libraries that you have never seen before, just as you might encounter new streets, buildings, and landmarks. At first, you may feel disoriented and overwhelmed, and may need to spend some time getting oriented and familiarizing yourself with the environment. As you explore, you may start to see patterns and similarities, just as you might begin to recognize neighborhoods and landmarks in a new city. You may also encounter dead-ends, confusing intersections, and unexpected detours, just as you might encounter bugs and errors in the code. To make progress in this unfamiliar territory, you may need to rely on a combination of intuition, deduction, and experimentation. You may need to break down the code into smaller pieces, analyze the behavior of each piece, and gradually build up a mental model of how the code works. You may also need to consult documentation, online resources, or other experts who are familiar with the codebase, just as you might ask locals or consult a map in a new city. In the end, with patience, persistence, and a willingness to learn, you can successfully navigate the unfamiliar code and achieve your goals, just as you can successfully explore a new city and discover its hidden treasures. Learning how to read and understand code written by someone else is a valuable skill that can help you be a more effective programmer. By being able to read and comprehend unfamiliar code, you can contribute to projects more quickly, diagnose and fix bugs more effectively, and learn new programming techniques and best practices. As with any skill, the more you practice reading and understanding code written by others, the more comfortable and proficient you will become at it. 6.3 AI in Reading Code While anyone can learn to understand code written by someone else, using AI language models to achieve this has three significant advantages. Summarize large codebases. AI can quickly scan large codebases and give you a high-level overview of what they do. This can save you a lot of time and effort, especially when dealing with complex codebases. For example, AI can identify the main functions and classes in a codebase, and it can show you how they are related. Analyze the code. AI can look at the code and tell you about its structure, complexity, and potential defects quickly and efficiently. This can help you understand the code better and make it easier to maintain and improve. For example, AI can identify parts of the code that are difficult to understand or maintain, and it can suggest ways to improve the code. Understand comments and documentation. AI can read comments and documentation written by other developers. This can help you understand the code’s purpose and the assumptions and limitations of the codebase. For example, AI can identify the key assumptions that are made in the code, and it can explain the trade-offs that were made in the design of the code. Keep these ethical considerations in mind when using AI to summarize code. Who owns the code? If you are not the owner of the code, you should get permission from the owner before summarizing it. What is the purpose of the summary? If you are summarizing the code for educational or research purposes, it is likely ethical to do so. However, if you are summarizing the code for commercial purposes, you may need to get permission from the owner. Does the code contain any sensitive or personal information? If the code contains any sensitive or personal information, you should take special care to protect that information. The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown. Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure. 6.4 Example One: Summarizing Code You Didn’t Write Here’s some simple code using the R programming language, written by an unnamed programmer. If you are familiar with R, you might be able to understand what this code does with little effort. However, if you don’t know R, you’re relatively new to R, or you’re just having a bad day, understanding what this bit of code can take a great deal more energy and time. AI language bots can ease that burden. library(tidyverse) d <- ggplot2::diamonds d_sel <- d %>% select(carat, cut, color, clarity, depth, table, price) d_fil <- d_sel %>% filter(price > 5000) d_bin <- d_fil %>% mutate(carat_bins = cut_width(carat, width = 0.2)) d_summ <- d_bin %>% group_by(carat_bins, cut, color, clarity) %>% summarize(mean_price = mean(price), sd_price = sd(price)) We first ask Claude-instant (as implemented by Poe) to explain the purpose of this code. What does this code do? You can ask for Claude to summarize each step in detail as well. Can you explain what each step does in detail? Maybe you’ve never worked with the tidyverse package in R - Claude can tell you about it. What is the tidyverse package? You might also not be familiar with the diamonds dataset (especially if you’ve never used it before) and want some clarification on where the data are from. Can you tell me more about the diamonds dataset? 6.5 Example Two: Identifying a Coding Language Sometimes you might have to work with legacy code. Legacy code can be difficult to work with, especially if it is written in a language or style with which you are not familiar. There are more than 700 programming languages in use today, so it is impossible for any programmer to know them all. AI can be a helpful tool for identifying the language and version of legacy code, which can make your life just a little easier. Let’s look at example code that might have been written decades ago. program temperature_smog_analysis; uses Math; const n_temperatures = 1000; n_smog_measures = 500; type TemperatureArray = array[0..n_temperatures-1] of integer; SmogArray = array[0..n_smog_measures-1] of integer; var temperatures: TemperatureArray; smog_measures: SmogArray; combined_data: array of integer; slope, y_intercept: double; i, j: integer; function connect_to_database(filename: string): integer; begin { implementation of connect_to_database function } end; function retrieve_temperatures(fd: integer): integer; begin { implementation of retrieve_temperatures function } end; function retrieve_smog_measures(fd: integer): integer; begin { implementation of retrieve_smog_measures function } end; function retrieve_temperature(fd, index: integer): integer; begin { implementation of retrieve_temperature function } end; function retrieve_smog_measure(fd, index: integer): integer; begin { implementation of retrieve_smog_measure function } end; procedure close_database(fd: integer); begin { implementation of close_database function } end; We can ask Bard to take a guess at what the coding language might be. What language is this code written in? We can explore further to figure out what indicators tell Bard the language is Pascal. How do you know this is Pascal? It turns out there are multiple versions of Pascal. Since you will presumably need to work with this legacy code, you might want to know which version it is. What version of Pascal is this? 6.6 Example Three: Interpreting Regex Regex (short for Regular Expressions) is a powerful tool used to describe patterns in text. It’s a sequence of characters that define a search pattern. Regular expressions can be a powerful tool for data cleaning, text mining, and data validation. They are widely used in web development, data science, and other fields where text processing is important. They can also be tricky to understand at first because they involve a specific syntax that can be complex. We can use ChatGPT to explain regex in a bit of someone else’s code. x = sub("/d/e","/d",x) What does this do? 6.7 Example Four: Demystifying Functions Sometimes we get handed code that includes complex architecture. Perhaps you are working with code you wrote years ago when you really liked loops, but present you finds them difficult to parse. (Be nice to past you - maybe you were a baby programmer and didn’t know better.) Instead of giving yourself a headache, you could turn to ChatGPT to explain what your old code does. def my_function(x): result = x for i in range(10): for j in range(5): result = result + 2 * (i + 1) * (j + 1) * (i % 2 == 0 and j % 2 == 0) - 1 return result What does this function do? Well, now you have an idea what past-you might have been attempting to do with this code. You can also query AI about the potential problems from using the code as-written. What are some possible issues with this code? (And if you do ever run across an expression like this in your code, you can check out our chapter on refactoring code using AI to fix it!) 6.8 Limitations Although chatbots can be extremely helpful for better understanding someone else’s code, if there is a lack of annotation or documentation, it will be difficult to discern more about the context of the development. Cases where developers may not have provided context information include: Not stating their purpose for writing this code. Not describing what they planned to do next. Not describing (or being aware of) possible parts of the code that need updating or maybe have security or privacy issues. Not describing how they made certain decisions in the development process. Although we can’t truly understand some of this information, it is however possible to get some assistance from chatbots with prompts such as: What do you think the purpose of this code is? What might be possible next steps to further develop this code? What are possible issues with this code, particularly for security or privacy? Why was the code possibly written with this structure? What other options are possible? Chatbot tools are also limited in terms of how up-to-date their training data is to know about current possible issues with code. 6.9 Hands-On Exercise Now it’s your turn to try. Let’s say you were handed this bit of code to work with (and eventually modify), but first you need to figure out what’s going on with it. Unfortunately, the original programmer left very little in the way of notes or annotation. 6.9.1 The Code Note: This code is just an example and was written strictly for educational purposes. use ode_solvers::{Euler, OdeMethod}; use plotters::prelude::*; use std::fs::File; const N: f64 = 1000000.0; const I0: f64 = 10.0; const R0: f64 = 0.0; const S0: f64 = N - I0 - R0; const BETA: f64 = 0.3; const GAMMA: f64 = 0.1; fn sir_model(t: f64, y: &[f64], v: f64) -> Vec<f64> { let s = y[0]; let i = y[1]; let r = y[2]; let dsdt = -BETA * s * i / N - v * s; let didt = BETA * s * i / N - GAMMA * i; let drdt = GAMMA * i + v * s; vec![dsdt, didt, drdt] } fn main() { let mut solver = Euler::new(sir_model); solver.set_initial_condition(&[S0, I0, R0]); let root = BitMapBackend::new("sir_vaccination.png", (800, 600)).into_drawing_area(); root.fill(&WHITE).unwrap(); let mut chart = ChartBuilder::on(&root) .caption("SIR model with vaccination", ("sans-serif", 40)) .set_label_area_size(LabelAreaPosition::Left, 60) .set_label_area_size(LabelAreaPosition::Bottom, 40) .build_cartesian_2d(0.0..100.0, 0.0..N) .unwrap(); let mut data1 = vec![]; for _ in 0..1000 { let y = solver.integrate(0.1, 0.0); data1.push((solver.time(), y[0], y[1], y[2])); } let vac_rate = 0.1; let vac_coverage = 0.5; let vac_num = vac_coverage * N; let mut data2 = vec![]; for i in 0..1000 { let t = i as f64 * 0.1; let v = if t >= 50.0 && t < 150.0 {vac_num * vac_rate} else {0.0}; let y = solver.integrate(0.1, v); data2.push((solver.time(), y[0], y[1], y[2])); } chart .configure_mesh() .x_labels(10) .y_labels(10) .disable_x_mesh() .disable_y_mesh() .draw() .unwrap(); chart .draw_series(LineSeries::new( data1.iter().map(|d| 6.9.2 Questions Write a query that identifies the features that can be used to determine the coding language, as well as the version of the language. What language and version is it written in? What features give away the coding language? Create a prompt that determines what this code does, as well as important features of the program. What does this code do? Write a prompt that determines the purpose of this specific code snippet: let root = BitMapBackend::new("sir_vaccination.png", (800, 600)).into_drawing_area(); What is the purpose of this line of code? 6.10 Summary Understanding code written by someone else can be made easier by using AI AI chatbots can summarize the code and the dataset used, as well as clarify functions and expressions that are confusing AI chatbots can also identify the coding language and version used in legacy code Be mindful of who owns the code and what the summary will be used for when using AI to summarize someone else’s code "],["ai-for-bioinformatics.html", "Chapter 7 AI for Bioinformatics 7.1 Learning Objectives 7.2 AI in Bioinformatics 7.3 Using these topics for Bioinformatics 7.4 Hands-On Exercise 7.5 Summary", " Chapter 7 AI for Bioinformatics 7.1 Learning Objectives Understand the potential applications of AI assistance in bioinformatics software development Identify the challenges and considerations when using AI in bioinformatics software development Grasp the serious ethical concerns when designing bioinformatics software that deals with sensitive data 7.2 AI in Bioinformatics Bioinformatics is an interdisciplinary field that combines biology, computer science, statistics, and data analysis to extract insights from biological data. It involves the development and application of computational tools, algorithms, and databases to store, organize, analyze, and interpret vast amounts of biological information, particularly in the context of genomics and public health. Biomedical imaging and electronic health record informatics are also rapidly growing subdisciplines. Bioinformatics is especially important for research and work done with genomes. In genomics, bioinformatics enables the identification and annotation of genes, the prediction of protein structures and functions, and the analysis of genetic variations, such as single nucleotide polymorphisms and gene copy number variations. These analyses help in unraveling the genetic basis of diseases, discovering potential drug targets, and facilitating personalized medicine approaches. The proliferation of genomic data in recent years have been transformative for the field of genomics. Advances in DNA sequencing technologies, particularly next-generation sequencing, have revolutionized the ability to generate vast amounts of genomic data quickly and at a reduced cost. This has led to an unprecedented increase in the availability and accessibility of genomic information. Researchers estimate that we will generate between 2 and 40 exabytes of genomic data within the next decade. Cloud computing platforms like AnVIL are on the forefront of addressing the big genomic data storage and processing challenge. You have seen principles in previous chapters that likewise apply to creating bioinformatics software. Creating bioinformatics software, however, comes with its own unique sets of challenges: Data Complexity: Genomic data is vast and complex, often consisting of large datasets with multiple dimensions, such as nucleotide sequences and quality scores, gene expression levels, and epigenetic modifications. Developing algorithms and software tools to efficiently handle and analyze this data requires expertise in data management, processing, and interpretation as well as the biological domain knowledge. Diverse Data Types: Bioinformatics software often handles diverse and complex data outside of strings and numeric types, such as Bioconductor’s Single Cell Experiment data class. Knowledge of common data types used in bioinformatics is required to create functional software. Algorithmic Complexity: Developing accurate and efficient algorithms for various bioinformatics tasks, such as sequence alignment, gene prediction, and protein structure prediction, requires deep understanding of biological principles and computational techniques. Designing algorithms that can handle noise, errors, and biological variations is a persistent challenge. Integration with Existing Workflows: Bioinformatics software is often part of a larger analysis pipeline or workflow. Integrating AI-based tools seamlessly into existing bioinformatics pipelines and workflows, ensuring compatibility, and maintaining data integrity throughout the process requires special attention. Ethical Considerations: People are typically more sensitive about their health data when compared to their Amazon purchases. This is for a good reason: there is a painful history of the medical establishment performing experiments on people with less socio-political power. Like medical record data, genomic data is supposed to be carefully protected so that it can’t be used to disadvantage certain individuals. AI tools on the other hand, are not typically designed with security and data sovereignty in mind. 7.3 Using these topics for Bioinformatics When using AI to help generate code examples, perform refactoring, understand code, or create unit tests, you should keep the following in mind: Writing Code: AI can be used to automatically generate bioinformatics code snippets or functions. It can provide code suggestions, auto-completion, or even generate code based on given specifications or data classes. This can include creating some unit tests for your functions to ensure they are working as expected. Refactoring Code: AI can aid in refactoring bioinformatics code to improve readability, maintainability, and performance through suggestions for code restructuring. Annotating Code: Bioinformatics tools can be used for clinical decision making. It’s important that anyone vetting or auditing your tool can trace any decision making steps. Therefore, clear understanding and documentation are key. AI can assist in drafting documentation and code annotations, such as through summarizing code functionality, clarifying complex expressions or functions, and/or providing plain-terms explanations of the underlying algorithms. Understanding Code: Although bioinformatics is rapidly evolving, its often specific use cases mean that sometimes developers have to work with code that is quite old or outdated. AI can help identify the coding language and version used in legacy bioinformatics code. This can be valuable when working with older software or when integrating different codebases. NEVER submit chunks of code to AI tools without being 100% sure that they are free from personally identifiable information (PII). PII includes variables like gender, race, birth date, geographic location, and disease diagnosis. If you need to, generate dummy data to test your code. 7.4 Hands-On Exercise Now it’s your turn to try. 7.4.1 The Code You are working on improving an alignment algorithm. Alignments are typically used for comparing samples from individuals to a reference consensus sequence. To make sure things are working, you want to compare the alignment created by an existing function (pairwiseAlignment) to an alignment created by your tool (my_new_Aligner). Note: This code is just an example and was written strictly for educational purposes. if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") if (!requireNamespace("Biostrings", quietly = TRUE)) BiocManager::install("Biostrings") library(Biostrings) reference <- DNAString("ATCGTACGTA") query <- DNAString("ATCGTAGCTA") # Perform pairwise sequence alignment alignment <- pairwiseAlignment(reference, query) # Perform sequence alignment using your algorithm my_alignment <- my_new_Aligner(reference, query) # Compare pairwise alignment to my_alignment my_evaluation <- my_alignment_evaluator(reference, query, alignment, my_alignment) 7.4.2 Questions Things are going great with the toy data! You decide to make a tiny subset of your data for testing your aligner by copying the top 100 sequences from patient 1. You then submit the aligner and data to Bard. You prompt, “is there anywhere handling this data can be made more efficient?” Why is this problematic? Can AI be used to improve annotation in the code above? Why or why not? You prefer writing loops, but have learned recently that R works more efficiently using apply() functions. Is it okay to ask ChatGPT to refactor a small loop into an apply() function? Why or why not? A colleague gives you a flash drive with some code from years ago. They swear it contains some useful snippets that will improve your aligner, but it’s in a language you haven’t worked with before. You inspect the code, and don’t believe it contains protected data. The algorithm was published many years ago in a leading journal. Is it okay to ask AI what the code is doing? 7.5 Summary Like other types of software, bioinformatics software and tools can benefit from AI assistance for writing, refactoring, annotating, and understanding code. Biological data and workflows can be complex. AI does not replace a thorough understanding of biological processes at play. NEVER submit any protected or sensitive data to AI applications. This can constitute a serious, often illegal breach of subject privacy if you are dealing with data from people. "],["languages-and-cloud-computing.html", "Chapter 8 Languages and Cloud Computing Programming Languages AI and Cloud Computing", " Chapter 8 Languages and Cloud Computing Programming Languages Here are commonly used languages and their typical use cases. Python Known for its readability, simple syntax, and comprehensive add-on modules. It is often recommended as a beginner-friendly language. Python is used for scientific computing, data analysis, machine learning, and web development. Many Data Scientist roles rely on Python. Ruby Similar to Python, Ruby is also known for its readability and user-friendly syntax. It is typically used for web development, building dynamic websites, and creating web applications. JavaScript A popular language for web development, known for its versatility and ease of use. Many of the interactive features of websites rely on JavaScript. Java Java is widely used for enterprise applications, android mobile apps, and web apps. PHP PHP is easy to learn but can become complex as the application grows. It is typically used for building dynamic web pages, web applications, and content management systems. R Used for statistical computing, data analysis, and data visualization, R is a popular choice for researchers and data scientists. Some beginners find its syntax complex, especially due to differences in the two major dialects: “base” R versus the “tidyverse”. Kotlin Kotlin is a relatively new language that is designed to be more concise and expressive than Java. However, its advanced features and concepts may be challenging for beginners. It is used for developing Android mobile apps, web development, and server-side development. Swift Used for iOS and macOS app development, Swift is known for its strict syntax and steep learning curve. C# Often used for Windows desktop applications, web applications, and video games, C# can be challenging due to its more complex syntax and strict type system. C++ A powerful language often used for developing high-performance applications and operating systems, C++ can be challenging to learn due to its complexity and the need to manage memory manually. Julia Julia is designed for numerical and scientific computing, data analysis, and machine learning. It has a syntax that is easy to learn and read, making it an ideal choice for data scientists, engineers, and researchers. Julia code can also be compiled to native machine code using the Julia compiler. Because Julia is a relatively new language, it has fewer add-on libraries and resources available compared to more established languages. MATLAB MATLAB is a high-level programming language and development environment commonly used for numerical computation, data analysis, and visualization. It is widely used in engineering, physics, and finance, among other fields. It is generally considered to have a user-friendly syntax and is relatively easy to learn compared to other languages. MATLAB has a development environment called MATLAB Compiler that allows you to package MATLAB code into standalone executables or shared libraries for use on a wide variety of operating systems. Fortran Fortran has been around for decades and has a long history in scientific and engineering communities. It has a more challenging syntax and is considered harder to learn than some of the other languages on this list. It is primarily used for scientific and engineering applications that require numerical computations and simulations such as computational physics, weather forecasting, and computational chemistry. Rust Rust is a systems programming language that focuses on speed, reliability, and memory safety. It is commonly used for developing performance-critical software, such as operating systems, web browsers, and game engines. Rust’s syntax is similar to C++, but is often considered to have a steeper learning curve due to its focus on memory safety and ownership concepts. AI and Cloud Computing AI in cloud computing is an exciting and rapidly advancing field that combines artificial intelligence (AI) and cloud computing technologies. Cloud computing provides on-demand access to computing resources over the internet, such as storage, processing, and analytics, while AI uses algorithms and models to enable machines to perform tasks traditionally requiring human intelligence. Developers can easily incorporate AI into applications using cloud-based AI services. For example, Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform offer AI services like speech recognition, image recognition, natural language processing, and machine learning. AI optimizes cloud resources and performance. It allocates resources based on workload demands, optimizes power usage, detects threats, and improves application latency by predicting user behavior and preloading content. However, integrating AI and cloud computing presents challenges around data, security, privacy, ethics, complexity, and management that must be addressed carefully. Large amounts of data are needed to train AI models, especially deep learning models requiring vast labeled data to identify patterns and predictions accurately. Storing and processing huge data can be expensive, impacting cost, scalability, and environment performance. Ensuring data security, privacy, and ethical use also require consideration. AI and cloud computing are complex with specialized skills and knowledge to use effectively. Integration increases complexity, challenging management and maintenance. Careful planning and execution evaluates benefits and limits of AI; selects appropriate models and algorithms; and determines integration into the cloud environment. Monitoring evaluates model performance, enabling adjustments to optimize performance and efficiency. As both AI and cloud computing technologies advance, the intelligent integration of the two can provide on-demand access to advanced AI capabilities, optimize resources, facilitate smarter applications, and gain data-driven business insights. Integration challenges around data, security, ethics, and complexity must be managed proactively through evaluation, planning, monitoring, and adjustments to maximize benefits and minimize limitations. When implemented responsibly, AI in cloud computing enables remarkable intelligent and scalable systems transforming organizations and society. "],["about-the-authors.html", "About the Authors", " About the Authors These credits are based on our course contributors table guidelines.     Credits Names Pedagogy Lead Content Instructor Elizabeth Humphries (Writing Code, Understanding Unfamiliar Code) Content Authors Candace Savonen (Annotating Your Code), Carrie Wright (Ethics of Using AI), Ava Hoffman (Refactoring Code) Content Editor Reviewer Howard Baek Technical Template Publishing Engineers Candace Savonen, Carrie Wright, Ava Hoffman Publishing Maintenance Engineer Candace Savonen Technical Publishing Stylists Carrie Wright, Ava Hoffman, Candace Savonen Package Developers (ottrpal) Candace Savonen, John Muschelli, Carrie Wright Art and Design Figure Artists Candace Savonen, Carrie Wright, Ava Hoffman, Elizabeth Humphries Funding Funders CS, CW, and HB were supported by the National Cancer Institute (NCI) under Grant UE5CA254170. AMH and EMH were supported by the AnVIL Project through cooperative agreement awards from the National Human Genome Research Institute with cofunding from OD/ODSS to the Broad Institute (U24HG010262) and Johns Hopkins University (U24HG010263). Funding Staff Sandra Ormbrek, Shasta Nicholson   Tools Used: ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## ottrpal 1.0.1 2023-03-28 [1] Github (jhudsl/ottrpal@151e412) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["references.html", "References", " References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] diff --git a/docs/no_toc/understanding-unfamiliar-code.html b/docs/no_toc/understanding-unfamiliar-code.html index 977cf992..fbcbc4dc 100644 --- a/docs/no_toc/understanding-unfamiliar-code.html +++ b/docs/no_toc/understanding-unfamiliar-code.html @@ -417,6 +417,10 @@

    6.3 AI in Reading Code

    What is the purpose of the summary? If you are summarizing the code for educational or research purposes, it is likely ethical to do so. However, if you are summarizing the code for commercial purposes, you may need to get permission from the owner.

    Does the code contain any sensitive or personal information? If the code contains any sensitive or personal information, you should take special care to protect that information.

    +
    +

    The information presented in this course is meant for use with open source code and software. It is unclear what happens to the information fed to AI chatbots as prompts, or how secure the data are. We know data are saved and may be used to further train the AI tools, but the specifics of how data are saved, as well as how sensitive or personally identifiable information are protected, is unknown.

    +

    Err on the side of caution when interacting with them. We do not recommend using proprietary code or private information for prompts unless you are working with an AI that you or your company built and you know is secure.

    +

    6.4 Example One: Summarizing Code You Didn’t Write

    diff --git a/docs/no_toc/writing-code.html b/docs/no_toc/writing-code.html index 48a22335..31ef5b16 100644 --- a/docs/no_toc/writing-code.html +++ b/docs/no_toc/writing-code.html @@ -734,7 +734,7 @@

    3.12 Summary

    ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC -## date 2023-07-14 +## date 2023-07-17 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source