### Navigation Reminder

- **Grey cells** are **code cells**. Click inside them and type to edit.
- **Run**  code cells by pressing $ \triangleright $  in the toolbar above, or press ``` shift + enter```.
-  **Stop** a running process by clicking &#9634; in the toolbar above.
- You can **add new cells** by clicking to the left of a cell and pressing ```A``` (for above), or ```B``` (for below). 
- **Delete cells** by pressing ```X```.
- Run all code cells that import objects (such as the one below) to ensure that you can follow exercises and examples.
- Feel free to edit and experiment - you will not corrupt the original files.

# Basic Python Data Types: Text Values

We defined **strings** previously in the lesson on basic data types.

> **Strings** are text data, distinguished by being surrounded by single or double quotes.

In this lesson, we will learn more about strings, including how to operate on them, how to access parts of strings (substrings and single characters) and useful methods included with the string object type.

---
Questions and exercises are distributed throughout this lesson. Please run the code cell below to import them before starting the lesson. The code will not produce any visible output, but exercises and questions will be loaded for later use.

In [None]:
# Run this cell to import lesson exercises
from QuestionsStrings import E1,E2,E3,E4,E5,Q6,E7, question, solution

---
## Lesson Goals
- String operators (+, *, in, not in)
- String indexing (fetching characters)
- String slicing (fetching parts of strings)
- Useful string methods

**Key Concepts** concatenation, string indexing, string slicing, string methods

---
# Working with Strings

**Strings** are text data, and are expressed inside single or double quotes. That is, these are equivalent in Python and you can choose between which to use.

```python
x = 'text'
x = "text"
```

Above, both statements are equivalent.

A few rules derive from the demarcation of strings with quotes. 

Occasionally you will have text that includes quotes within the string. If you choose to surround your string with double quotes, you can include single quotes within it, and vice versa:

```python
x = "Chicago O'Hare"
```

Quotes (and other special characters) can also be **escaped** by preceding them with the backslash (\\) character.

```python
x = 'Chicago O\'Hare'
```

Alternatively, if you have a string with backslashes in it (for example, a file directory on Windows), you can convert the string into a 'raw string' and void the escape function by preceding the string with the letter r, as shown here:
```python
r'\string'
```

Finally, if a string has several lines in it, you must enclose it in triple quotes:
```python
x = '''this
is
one
string'''
```

**Exercise 1** The following lines of code attempt to assign a text value to a  variable, but they contain some syntax errors. Edit them (without deleting any characters) by adding the proper quotation marks or using escape characters so that the text in the cell can be stored as a single string.

In [None]:
a = Scarlett O'Hara

In [None]:
b = “The time has come," the walrus said, "to talk of many things”

In [None]:
c = “But wait a bit,” the Oysters cried,
      “Before we have our chat;
     For some of us are out of breath,
      And all of us are fat!”

In [None]:
d = Filefolder\document.txt

In [None]:
solution(E1)

---
## String Operators

As with numerical values, we can also conduct operations with strings, but the operators have different meanings: 

|symbol| operation|
|--|--|
|+| concatenate|
|*| repeat|
|in |checks whether a substring is in the string|
|not in| checks whether a substring is not in the string|

When concatenating separate words, concatenate (+) requires that you add spaces into strings, it will not do so automatically.

For example, we can concatenate the artist name with other terms to form a sentence:

In [None]:
artist = 'Frida Kahlo'

In [None]:
print( artist + "was a painter")

Note that we can concatenate a string variable with a string we are specifying in place. But also note that the program did not include a space between the name and the start of the second string. We have to include that ourselves, in the new string.

In [None]:
print( artist + " was a painter") # included a space

Trying to concatenate strings with other data types will not work.

In [None]:
print(artist + 'was born in' + 1907) # will output an error

To concatenate these, you would have to convert the integer into a string using quotes or the str() function, as we saw in the previous lesson.

In [None]:
print(artist + 'was born in' + '1907')

**Exercise 2: Use concatenation to print the artist and her life span in the conventional way, i.e. Artist Name (Birth-Death).** 

This is tricky. Remember to take spaces into account, and to write any new strings in between single or double quotes. Also remember to consider variables' data types. 

You can use the empty code cell below. Run your cell to check your output, and iterate as necessary. You will probably need to experiment a little! 

When you are ready, you can also check the correct answer below. The cell's output should be Frida Kahlo (1907-1954).

In [None]:
year_of_birth = 1907
year_of_death = 1954

In [None]:
solution(E2)

---
# Getting a Character in a String: String Indexing 

In Python,strings are in essence sequences of characters. We can access parts of strings by referencing their position within a string.

<br>
<div>
<img src="Other_files/string2.png" width="300"/>
</div>
<br>

We can access characters  in strings by giving the name of the variable followed by square brackets. Each character is **indexed** by their position in the string, starting at **0**. This can be a bit confusing! 

So for instance, say we have a variable called var, which contains the text value 'string'.

```python
var = 'string'
```

In the diagram, **var[0]** would return the first character 's', for example.

**Exercise 3** Retrieve the second character of the artist variable.

In [None]:
solution(E3)

**Exercise 4** Determine the positions of our artist's initials, and then edit the code below to extract, concatenate and print them. Note that whitespace characters also take up a position.

In [None]:
firstinitial = artist[_]
lastinitial = artist[_]

print(firstinitial + '.' + lastinitial + '.')

In [None]:
solution(E4)

Finally, we can also access characters starting from the end of the string, which is done with negative indexes:

<br>
<div>
<img src="Other_files/string3.png" width="300"/>
</div>

For example, if we were looking for the second-to-last character of a string, we could do var[-2].

In [None]:
artist[-2]

---
# Getting Parts of Strings: String slicing

We can also access a range within the string (a substring), which is called **string slicing**. 

To do so, we call the variable name, followed by the first position, a colon, and the last position of the characters we want, in square brackets. 

Slicing will pick everything **up to, but not including**, a position. In other words, the second number is not inclusive, meaning that it gives the position of the character given, - 1. 

That is, to get part of a string, we can use:

**var[first_position : last_position + 1].**

**var[0:2]** would return the first and second characters, despite position 2 being that of the third character (confusing, I know!). It helps to remember that position 0 is character 1, and the span goes **up to, but not including** the last position.


**Exercise 5** Edit the code below to retrieve the substring that includes our artist's name from her first initial to (and including) her second initial. That is, the output should be 'Frida K'.

In [None]:
artist[___]

In [None]:
solution(E5)

When slicing, we can **omit the start or end positions.** 

Omitting the start position of the slice would take every character from the beginning of the string up to (but not including) the end position. For instance, we could extract the first name of our artist just by giving an end position:

In [None]:
artist[:5]

Omitting the end position would take every character from the beginning position up to the end of the string:

In [None]:
artist[6:]

And omitting both would extract the whole string.

In [None]:
artist[:]

In [None]:
question(Q6)

---
#  Useful String Methods 

Strings come with [multiple methods](https://docs.python.org/2.5/lib/string-methods.html) that are extremely useful when working with text data. These can be used to create lists of words, clean whitespace and fix capitalization, among other useful actions. 

|**Method**| Action |
|---|---|
|**string.lower()**| makes string lowercase|
|**string.upper()**| makes string uppercase|
|**string.split('separator')**| returns a list of substrings, split by a separator (space, if blank)|
|**string.find('substring')**| gives you first position of a substring in a string|
|**string.replace('old','new')**| replaces all occurences of search string with the new string.|
|**string.lstrip()**| removes whitespace on left|
|**string.rstrip()**| removes whitespace on right|
|**string.strip()**| removes whitespace on both sides|
|**line.startswith('substring')**| asks whether string starts with substring, giving true/false response|

Another useful tool is the built-in ```len()``` function, which can be useful with strings. You call it by using ```len(string)```.

String methods will not act on the original string that has been saved as a variable, so any modifications have to be reassigned to our variable or assigned to a new one. 

For example, let's try the lowercase method on our artist variable. Run the code below:

In [None]:
artist.lower()

It outputs a lowercase version of ourtext. But if we call our variable, we see that its value has not changed: 

In [None]:
artist

You could create a new variable to store the output of the method:  

In [None]:
artist_lower = artist.lower()

In [None]:
artist_lower

Alternatively, you could just reassign the output of the method to the existing variable. Doing this will erase its original value (in our case, the version with uppercase characters).

In [None]:
artist = artist.lower()

In [None]:
artist

**Exercise 7:** Run the cell below, which creates a string of many artists' names with incorrect formatting and errors. In the empty cell, make the following changes to the string using string methods. Unless specified, they are listed in the table above.

1. Replace underscores with spaces
1. Replace zeroes with o's
1. Fix capitalization to make it titlecase (you will have to research the [strings documentation](https://docs.python.org/2.5/lib/string-methods.html) for this one).
1. This variable is a string. Transform it into a list of individual artists' names by separating on the comma character , 

Remember that the methods themselves will not be applied to the original string directly.

The final output should be a list of artists names, correctly spelled and capitalized.

If you would like to work in multiple cells, you can add cells by clicking to the left of a cell and pressing A or B (to add above or below).

In [None]:
artists ='PaBl0_Picass0,HeNri_MatiSse,Ed0uard_MaNet,S0f0nisba_AnGUiss0la,MaRie_CaSsAtt'

In [None]:
print(artists)

The output you should see if you have done this exercise correctly looks like this (perhaps in several lines):

['Pablo Picasso',
'Henri Matisse',
'Edouard Manet',
'Sofonisba Anguissola',
'Marie Cassat']

The output is a list of strings. You can recognize lists because they are delimited by square brackets, and each item within the list is separated by a comma. The strings within that list are delimited by quotes, as we have already learned. 

Lists are one of several data types that allow us to store multiple values. We study lists in more detail in Lesson 08.

In [None]:
solution(E7)

--- 
# Lesson Summary

- Strings are designated by "" or ''
- Further specifications might result from strings containing quotes, multiple lines or special characters
- \+ and * can be used to concatenate and repeat strings
- String[a]: Strings can be indexed by position, starting at 0
- Strings[a:b] Strings can be sliced, up to but not including position b.
- String methods provide powerful actions for string manipulation.

<div style="text-align:center">    
  <a href="03%20Basic%20Data%20Types%20I%20-%20Numbers.ipynb"> Previous Lesson: Basic Data Types I: Numbers & Numerical Operators</a>|
   <a href="05%20Basic%20Data%20Types%20III%20-%20Collections.ipynb">Next Lesson: Basic Data Types III: Collections</a>
</div>