Skip to main content

Extracting and Cleaning Up Data

Tip
For this exercise, you can practice with either reasoning model (o3, o4-mini) or a non-reasoning model (4o, 4.1).

Description

AI chatbots can be employed to extract meaningful information from text. AI can also ensure that said extraction is done in a clean and consistent way.

Prompt


You are an expert data extractor. Extract the names, states and ages of each student in their biographies (delimited by ###). Extract the interests (comma separated), pursuing degree and degree level (undergraduate, graduate, other). Print results as table.

###
1. Emily is 22 years old and lives in California. Growing up I was always interested in the sciences and technology. In high school, I excelled in math and science classes, which led me to pursue a degree in engineering. I went on to earn my Bachelor’s degree in Engineering and am currently working on my Master’s in Civil Engineering. After I finish my degree, I plan to use my knowledge to help improve infrastructure in developing countries.

2. Michael is 24 years old and lives in Texas. Since a young age, I have been passionate about helping people in need. During college, I decided to focus my studies on sociology and political science. I used my knowledge to take action and participated in several non-profit organizations to promote social justice. With the help of internships, I have also gained experience in policy development and public relations. I’m currently working on a graduate degree in Social Work and aim to eventually work for the United Nations.

3. Sarah is 21 years old and lives in New York. As a freshman in high school, I was unsure of what kind of career path I wanted to pursue. After talking with my parents and teachers, I decided to start college as an undeclared major. After two years of exploration, I settled on a double major of business and economics. I’ve been able to take advantage of various opportunities on and off campus and served as an intern in the Human Resources department of a major banking firm. I look forward to earning my degree and using my skills to become a successful business leader.

4. David is 23 years old and lives in Colorado. I never thought that I had the ability to become a professional photographer, but my high school photography teacher pushed me to pursue my dreams. I learned how to use a variety of cameras, develop photographs, and post-process my images. I continued my photography studies at college, where I gained additional knowledge in studio lighting and digital editing. I was even able to gain experience in the fashion industry, interning for a well-known photographer. Today, I am working as an event photographer, documenting weddings, reunions, and corporate events.

5. Jessica is 25 years old and lives in Florida. When I was younger I was passionate about art, which prompted me to pursue a bachelor’s degree in Fine Arts. During college, I was able to gain experience as a studio assistant and also learn various digital and traditional art techniques. With the help of a scholarship, I was able to travel to various countries and learn even more about different art styles. After graduation, I started working as a freelance artist and have been able to produce several commissioned artworks and pieces.

6. Daniel is 22 years old and lives in Washington. As a child, I always exhibited an aptitude for mathematics and problem solving. After researching various career paths, I decided to major in Computer Science. I was able to gain valuable experience while interning at a tech startup and also during an internship with the Department of Defense. I am currently working on my master’s degree and plan to focus my studies on artificial intelligence and machine learning.

7. Olivia is 20 years old and lives in Georgia. I have been involved in theater since I was a young child. During my high school years, I focused on honing my abilities through various extracurricular activities. I was able to gain valuable experience by participating in multiple productions and I even gained a scholarship for theater. I attended college to study Musical Theater and continued to foster my talent. With help from internships and workshops, I was able to build additional experience and formed a touring theater company with some of my colleagues.

8. Matthew is 23 years old and lives in Oregon. From an early age, I was interested in the environment and the outdoors. I spent a lot of time reading environmental books and researching environmental issues. This passion inspired me to major in Environmental Science in college. I’ve gained valuable experience through various internships and part-time jobs. With the help of my degree, I’ve been able to work on several conservation projects and hope to soon work for a non-profit organization focused on sustainability.

9. Sophia is 24 years old and lives in Virginia. When I graduated high school I wanted to focus my career on the medical field. After much research, I decided to major in Biomedical Engineering. During my time in college, I was able to gain a valuable experience by interning as a research assistant. I have been able to learn more about medical technology and have been able to contribute to various projects. I’m currently pursuing a master’s degree in Biomedical Engineering and plan to continue my research in the field and eventually work for a healthcare company.

10. Andrew is 22 years old and lives in Illinois. I have always had an interest in fashion and design, so when the time came to decide my career path it made sense to pursue design. During college, I was able to study many aspects of fashion and gain experience through various internships. I was also able to travel to other countries to observe trends and learn about different cultures. With my degree, I am currently working as a fashion designer and I plan to continue to use my creative eye to come up with innovative and stylish designs.
  

Sample Output

The AI will do its best to extract meaningful data from the information given. Below is an example response to the prompt above.

"Here’s the extracted information in tabular form:

NameAgeStateInterestsPursuing DegreeDegree Level
Emily22CaliforniaSciences, technology, math, engineeringMaster’s in Civil EngineeringGraduate
Michael24TexasHelping people, sociology, political science, social justice, policy, public relationsGraduate degree in Social WorkGraduate
Sarah21New YorkBusiness, economics, human resourcesBachelor’s in Business & EconomicsUndergraduate
David23ColoradoPhotography, studio lighting, digital editing, fashionBachelor’s in PhotographyUndergraduate
Jessica25FloridaArt, digital art, traditional artBachelor’s in Fine ArtsUndergraduate
Daniel22WashingtonMathematics, problem solving, computer science, AI, machine learningMaster’s in Computer Science (AI/ML focus)Graduate
Olivia20GeorgiaTheater, musical theater, performing artsBachelor’s in Musical TheaterUndergraduate
Matthew23OregonEnvironment, outdoors, conservationBachelor’s in Environmental ScienceUndergraduate
Sophia24VirginiaMedicine, biomedical engineering, medical technologyMaster’s in Biomedical EngineeringGraduate
Andrew22IllinoisFashion, design, culture, creativityBachelor’s in Fashion DesignUndergraduate

Would you like me to also normalize the "interests" into a consistent list of keywords (e.g., "STEM," "Arts," "Social Sciences") for easier categorization?"