Input and Output#

  • str() function return human-readable representations of values.

  • repr() generate representations which can be read by the interpreter.

  • For objects which don’t have a particular representation for human consumption, str() will return the same value as repr().

s = 'Hello, world.'
str(s)
'Hello, world.'
l = list(range(4))
str(l)
'[0, 1, 2, 3]'
repr(s)
"'Hello, world.'"
repr(l)
'[0, 1, 2, 3]'
x = 10 * 3.25
y = 200 * 200
s = 'The value of x is ' + str(x) + ', and y is ' + repr(y) + '...'
print(s)
The value of x is 32.5, and y is 40000...

repr() of a string adds string quotes and backslashes:

hello = 'hello, world\n'
hellos = repr(hello)
hellos
"'hello, world\\n'"

The argument to repr() may be any Python object:

repr((x, y, ('spam', 'eggs')))
"(32.5, 40000, ('spam', 'eggs'))"
n = 7
for x in range(1, n):
    for i in range(n):
        print(repr(x**i).rjust(i+2), end=' ') # rjust or center can be used
    print()
 1   1    1     1      1       1        1 
 1   2    4     8     16      32       64 
 1   3    9    27     81     243      729 
 1   4   16    64    256    1024     4096 
 1   5   25   125    625    3125    15625 
 1   6   36   216   1296    7776    46656 
for x in range(1, n):
    for i in range(n):
        print("%07d" % x**i, end=' ')  # old C format
    print()
0000001 0000001 0000001 0000001 0000001 0000001 0000001 
0000001 0000002 0000004 0000008 0000016 0000032 0000064 
0000001 0000003 0000009 0000027 0000081 0000243 0000729 
0000001 0000004 0000016 0000064 0000256 0001024 0004096 
0000001 0000005 0000025 0000125 0000625 0003125 0015625 
0000001 0000006 0000036 0000216 0001296 0007776 0046656 

Usage of the str.format() method#

print('We are at the {} in {}!'.format('ENSAI', 'Rennes'))
We are at the ENSAI in Rennes!
print('From {0} to  {1}'.format('September 7', 'September 14'))
From September 7 to  September 14
print('It takes place at {place}'.format(place='Milon room'))
It takes place at Milon room
import math
print('The value of PI is approximately {:.7g}.'.format(math.pi))
The value of PI is approximately 3.141593.

Formatted string literals (Python 3.6)#

print(f'The value of PI is approximately {math.pi:.4f}.')
The value of PI is approximately 3.1416.
name = "Fred"
print(f"He said his name is {name}.")
print(f"He said his name is {name!r}.")
He said his name is Fred.
He said his name is 'Fred'.
f"He said his name is {repr(name)}."  # repr() is equivalent to !r
"He said his name is 'Fred'."
width, precision = 10, 4
value = 12.34567
print(f"result: {value:{width}.{precision}f}")  # nested fields
result:    12.3457
from datetime import *
today = datetime(year=2017, month=1, day=27)
print(f"{today:%B %d, %Y}")  # using date format specifier
January 27, 2017

Reading and Writing Files#

open() returns a file object, and is most commonly used with file name and accessing mode argument.

f = open('workfile.txt', 'w')
f.write("1. This is a txt file.\n")
f.write("2. \\n is used to begin a new line")
f.close()
!cat workfile.txt
1. This is a txt file.
2. \n is used to begin a new line

mode can be :

  • ‘r’ when the file will only be read,

  • ‘w’ for only writing (an existing file with the same name will be erased)

  • ‘a’ opens the file for appending; any data written to the file is automatically added to the end.

  • ‘r+’ opens the file for both reading and writing.

  • The mode argument is optional; ‘r’ will be assumed if it’s omitted.

  • Normally, files are opened in text mode.

  • ‘b’ appended to the mode opens the file in binary mode.

with open('workfile.txt') as f:
    read_text = f.read()
f.closed
True
read_text
'1. This is a txt file.\n2. \\n is used to begin a new line'
lines= []
with open('workfile.txt') as f:
    lines.append(f.readline())
    lines.append(f.readline())
    lines.append(f.readline())
    
lines
['1. This is a txt file.\n', '2. \\n is used to begin a new line', '']
  • f.readline() returns an empty string when the end of the file has been reached.

  • f.readlines() or list(f) read all the lines of a file in a list.

For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:

with open('workfile.txt') as f:
    for line in f:
        print(line, end='')
1. This is a txt file.
2. \n is used to begin a new line

Exercise: Wordcount Example#

WordCount is a simple application that counts the number of occurrences of each word in a given input set.

  • Use lorem module to write a text in the file “sample.txt”

  • Write a function words with file name as input that returns a sorted list of words present in the file.

  • Write the function reduce to read the results of words and sum the occurrences of each word to a final count, and then output the results as a dictionary {word1:occurences1, word2:occurences2}.

  • You can check the results using piped shell commands:

cat sample.txt | fmt -1 | tr [:upper:] [:lower:] | tr -d '.' | sort | uniq -c 
from lorem import text

text()
'Voluptatem consectetur consectetur voluptatem quisquam amet consectetur. Dolore quiquia magnam sit eius labore dolor quisquam. Sit porro labore sed porro labore quisquam aliquam. Etincidunt dolorem consectetur quaerat ut dolore eius. Consectetur etincidunt aliquam non neque eius eius modi. Est velit dolorem est sit ipsum. Adipisci magnam voluptatem dolor non ipsum voluptatem quiquia.\n\nDolorem numquam quiquia dolor tempora dolore. Quaerat quiquia ipsum adipisci. Est quiquia aliquam quiquia. Aliquam tempora dolor ipsum quiquia etincidunt dolorem porro. Ipsum voluptatem sed eius sed sed numquam voluptatem. Dolor consectetur quisquam ut ut etincidunt magnam.\n\nQuiquia dolorem amet ipsum velit sed tempora. Dolorem voluptatem amet ut est ipsum eius magnam. Dolore magnam dolorem magnam non ipsum. Velit voluptatem quiquia quaerat. Aliquam est est sit eius amet. Aliquam sit sit est. Numquam quisquam amet numquam numquam ipsum. Est etincidunt modi etincidunt non.\n\nMagnam ut dolore neque voluptatem. Ipsum numquam tempora velit eius dolorem amet. Quiquia porro sit aliquam aliquam porro neque. Dolore dolore quaerat sed tempora tempora dolorem eius. Eius aliquam labore ut modi sit. Labore ipsum aliquam est dolor aliquam. Aliquam non ut amet aliquam adipisci. Amet quiquia tempora velit sed quisquam quaerat dolor. Adipisci non est dolore sit voluptatem. Quiquia labore tempora labore dolore porro.\n\nQuisquam ipsum velit modi modi quiquia. Dolorem numquam porro consectetur. Dolorem neque adipisci dolor eius ut dolorem. Ut est est tempora ipsum adipisci. Velit est est adipisci ut. Quisquam ut numquam amet dolore voluptatem. Est tempora aliquam numquam porro quaerat neque. Amet quisquam magnam quaerat.'
def words( file ):
    """ Parse a file and returns a sorted list of words """
    pass

words('sample.txt')
#[('adipisci', 1),
# ('adipisci', 1),
# ('adipisci', 1),
# ('aliquam', 1),
# ('aliquam', 1),
d = {}
d['word1'] = 3
d['word2'] = 2
d
{'word1': 3, 'word2': 2}
def reduce ( words ):
    """ Count the number of occurences of a word in list
    and return a dictionary """
    pass

reduce(words('sample.txt'))
#{'neque': 80),
# 'ut': 80,
# 'est': 76,
# 'amet': 74,
# 'magnam': 74,
# 'adipisci': 73,

Saving structured data with json#

  • JSON (JavaScript Object Notation) is a popular data interchange format.

  • JSON format is commonly used by modern applications to allow for data exchange.

  • JSON can be used to communicate with applications written in other languages.

import json
json.dumps([1, 'simple', 'list'])
'[1, "simple", "list"]'
x = dict(name="Pierre Navaro", organization="CNRS", position="IR")
with open('workfile.json','w') as f:
    json.dump(x, f)
with open('workfile.json','r') as f:
    x = json.load(f)
x
{'name': 'Pierre Navaro', 'organization': 'CNRS', 'position': 'IR'}
%cat workfile.json
{"name": "Pierre Navaro", "organization": "CNRS", "position": "IR"}

Use ujson for big data structures https://pypi.python.org/pypi/ujson

For common file formats used in data science (CSV, xls, feather, parquet, ORC, HDF, avro, …) use packages like pandas or better pyarrow. It depends of what you want to do with your data but Dask and pyspark offer features to read and write (big) data files.