10 NLP Projects to Boost Your Resume


Natural Language Processing (NLP) is a very exciting field. Already, NLP projects and applications are visible all around us in our daily life. From conversational agents (Amazon Alexa) to sentiment analysis (Hubspot’s customer feedback analysis feature), language recognition and translation (Google Translate), spelling correction (Grammarly), and much more. 

Whether you’re a developer or data scientist curious about NLP, why not just jump in the deep end of the pool, and learn by doing it?

With well-known frameworks like PyTorch and TensorFlow, you just launch a Python notebook and you can be working on state-of-the-art deep learning models within minutes. 

In this article, I’ll help you practice NLP by suggesting 10 great projects you can start working on right now—plus, each of these projects will be a great addition to your resume!

Read more

A brief history of the NLP field

This is just a bit of background about Natural Language Processing, but you can skip on to the projects if you’re not interested.

NLP was born in the middle of the 20th century. A major historical NLP landmark was the Georgetown Experiment in 1954, where a set of around 60 Russian sentences were translated into English. 

The 1970s saw the development of a number of chatbot concepts based on sophisticated sets of hand-crafted rules for processing input information. In the late 1980s, singular value decomposition (SVD) was applied to the vector space model, leading to latent semantic analysis—an unsupervised technique for determining the relationship between words in a language.

In the past decade (after 2010), neural networks and deep learning have been rocking the world of NLP. These techniques achieve state-of-the-art results for the hardest NLP tasks like machine translation. In 2013, we got the word2vec model and its variants. These neural-network-based techniques vectorize words, sentences, and documents in such a way, that the distance between vectors in the generated vector space represents the difference in meaning between the corresponding entities.

In 2014, sequence-to-sequence models were developed and achieved a significant improvement in difficult tasks, such as machine translation and automatic summarization.

Later it was discovered that long input sequences were harder to deal with, which led us to the attention technique. This improved sequence-to-sequence model performance by letting the model focus on parts of the input sequence that were the most relevant for the output. The transformer model improves this more, by defining a self-attention layer for both the encoder and decoder.

The cleverly named “attention is all you Need” paper that introduced the attention mechanism also enabled the creation of powerful deep learning language models, like: 

  • ULM-Fit – Universal Language Model Fine-tuning: a method for fine-tuning any neural-network-based language model for any task, demonstrated in the context of text classification. A key concept behind this method is discriminative fine-tuning, where the different layers of the network are trained at different rates.
  • BERT – Bidirectional Encoder Representations from Transformers: a modification of the Transformer architecture by preserving the encoders and discarding the decoders, it relies on masking of words which would then need to be predicted accurately as the training metric.
  • GPT- Generative Pretrained Transformer: modification of Transformer’s encoder-decoder architecture meant to achieve a fine-tunable language model for NLP. It discarded the encoders, retaining the decoders and their self-attention sublayers.

Recent years have seen the most rapid advances in the NLP field. In the modern NLP paradigm, transfer learning, we can adapt/transfer knowledge acquired from one set of tasks to a different set. This is a big step towards the full democratization of NLP, allowing knowledge to be re-used in new settings at a fraction of the previously required resources.

Why should you build NLP projects?

NLP is at the intersection of AI, computer science, and linguistics. It deals with tasks related to language and information. Understanding and representing the meaning of language is difficult. So, if you want to work in this field, you’re going to need a lot of practice. The projects below will help you do that. 

Building real-world NLP projects is the best way to get NLP skills and transform theoretical knowledge into valuable practical experience. 

Later, when you’re applying for an NLP-related job, you’ll have a big advantage over people that have no practical experience. Anyone can add “NLP proficiency” to their CV, but not everyone can back it up with an actual project that you can show to recruiters.

Okay, we’ve done enough introductions. Let’s move on to the 10 NLP projects that you can start right now. We have beginner, intermediate, as well as advanced projects—choose the one you like, and become the NLP master you’ve always wanted to be!

10 NLP project ideas to boost your resume

We’ll start with beginner-level projects, but you can move on to intermediate or advanced projects if you’ve already done NLP in practice.

Beginner NLP projects

  1. Sentiment analysis for marketing

This type of project can show you what it’s like to work as an NLP specialist. For this project, you want to find out how customers evaluate competitor products, i.e. what they like and dislike. It’s a great business case. Learning what customers like about competing products can be a great way to improve your own product, so this is something that many companies are actively trying to do.

To achieve this task, you will employ different NLP methods to get a deeper understanding of customer feedback and opinion.

Start project now → Go To Project Repository

In this project, you want to create a model that predicts to classify comments into different categories. Comments in social media are often abusive and insulting. Organizations often want to ensure that conversations don’t get too negative. This project was a Kaggle challenge, where the participants had to suggest a solution for classifying toxic comments in several categories using NLP methods.

Start project now → Go To Project Repository

This is a good project for beginners to learn basic NLP concepts and methods. We can easily see how Chrome, or another browser, detects the language in which a web page is written. This task is a lot easier with machine learning.

You can build your own language detection with the fastText model by Facebook.

Start project now → Go To Project Repository

Intermediate NLP projects

  1. Predict closed questions on Stack Overflow

If you’re a programmer of any kind, I don’t need to tell you what Stack Overflow is. It’s any programmer’s best friend.

Programmers ask many questions on Stack Overflow all the time, some are great, others are repetitive, time-wasting, or incomplete. So, in this project, you want to predict whether a new question will be closed or not, along with the reason why. 

The dataset has several features including the text of question title, the text of question body, tags, post creation date, and more.

Start project now → Go To Dataset

Text summarization is one of the most interesting problems in NLP. It’s hard for us, as humans, to manually extract the summary of a large document of text. 

To solve this problem, we use automatic text summarization. It’s a way of identifying meaningful information in a document and summarizing it while conserving the overall meaning. 

The purpose is to present a shorter version of the original text while preserving the semantics.

In this project, you could use different traditional and advanced methods to implement automatic text summarization, and then compare the results of each method to conclude which is the best to use for your corpus.

Start project now → Go To Project Repository

  1. Document Similarity (Quora question pair similarity)

Quora is a question and answer platform where you can find all sorts of information. Every piece of content on the site is generated by users, and people can learn from each other’s experiences and knowledge. 

For this project, Quora challenged Kaggle users to classify whether question pairs are duplicated or not. 

This task requires finding high-quality answers to questions which will result in the improvement of the Quora user experience from writers to readers.

Start project now → Go To Dataset

  1. Paraphrase detection task

Paraphrase detection is a task that checks if two different text entities have the same meaning or not. This project has various applications in areas like machine translation, automatic plagiarism detection, information extraction, and summarization. The methods for paraphrase detection are grouped into two main classes: similarity-based methods, and classification methods. 

Start project now → Go To Project Repository

Advanced NLP projects

  1. Generating research papers titles

This is a very innovative project where you want to produce titles for scientific papers. For this project, a GPT-2 is trained on more than 2,000 article titles extracted from arXiv. You can use this application on other things, like text generating tasks for producing song lyrics, dialogues, etc. From this project, you can also learn about web scraping, because you will need to extract text from research papers in order to feed it to your model for training. 

Start project now → Go To Project Repository

  1. Translate and summarize news

You can build a web app that translates news from Arabic to English and summarizes them, using great Python libraries like newspaper, transformers, and gradio.

Where:

Start Project Now → Useful Link

  1. RESTful API for similarity check

This project is about building a similarity check API using NLP techniques. The cool part about this project is not only about implementing NLP tools, but also you will learn how to upload this API over docker and use it as a web application. In doing this, you will learn how to build a full NLP application. 

Start project now → Go To Project Repository 

Conclusion

And that’s it! Hope you’re able to pick a project that interests you. Get your hands dirty, and start working on your NLP skills! Building real projects is the single best way to get better at this, and also to improve your resume. 

That’s all for now. Thanks for reading!


READ NEXT

How to Structure and Manage Natural Language Processing (NLP) Projects

Dhruvil Karani | Posted October 12, 2020

If there is one thing I learned working in the ML industry is this: machine learning projects are messy.

It is not that people don’t want to have things organized it is just there are many things that are hard to structure and manage over the course of the project. 

You may start clean but things come in the way. 

Some typical reasons are:

  • quick data explorations in Notebooks, 
  • model code taken from the research repo on github, 
  • new datasets added when everything was already set,
  • data quality issues are discovered and re-labeling of the data is needed,
  • someone on the team “just tried something quickly” and changed training parameters (passed via argparse) without telling anyone about it,
  • push to turn prototypes into production “just this once” coming from the top.

Over the years working as a machine learning engineer I’ve learned a bunch of things that can help you stay on top of things and keep your NLP projects in check (as much as you can really have ML projects in check:)). 

In this post I will share key pointers, guidelines, tips and tricks that I learned while working on various data science projects. Many things can be valuable in any ML project but some are specific to NLP. 


Continue reading ->




Source link

Latest articles

Related articles

Leave a reply

Please enter your comment!
Please enter your name here