Hey there!
Welcome to ClearUrDoubt.com.
In this post, we will look at a Python program to prepare test data for further practice of PySpark.
We are going to use faker module for generating the test data. Before we make use of faker module, we need to install the module in local virtual environment:
1 |
pip install Faker |
Once the module is installed, we can write a simple Python program to write the generated test data to a file:
1 2 3 4 5 6 7 8 9 10 11 12 |
""" Faker demo for generating test data """ from faker import Faker if __name__ == '__main__': faker = Faker() print('Generating sample data:\n') for _ in range(1, 30): print(f"{faker.name()} - {faker.country()} - {faker.email()}") |
Output:
Happy learning.
Please leave a reply in case of any queries.