I’m working on my final-year research paper in AI/Gen-AI/Data Engineering, and I need help choosing the best advanced research topic that I can implement using only free and open-source tools (no GPT-4, no paid APIs, no proprietary datasets).
My constraints:
Must be advanced enough to look impressive in research + job interviews
Must be doable in 2 months
Must use 100% free tools (Llama 3, Mistral, Chroma, Qdrant, FAISS, HuggingFace, PyTorch, LangChain, AutoGen, CrewAI, etc.)
The topic should NOT depend on paid GPT models or have a paid model that performs significantly better
Should help for roles like AI Engineer, Gen-AI Engineer, ML Engineer, or Data Engineer
Topics I’m considering:
RAG Optimization Using Open-Source LLMs
– Hybrid search, advanced chunking, long-context models, vector DB tuning
Vector Database Index Optimization
– Evaluating HNSW, IVF, PQ, ScaNN using FAISS/Qdrant/Chroma
Open-Source Multi-Agent LLM Systems
– Using CrewAI/AutoGen with Llama 3/Mistral to build planning & tool-use agents
Embedding Model Benchmarking for Domain Retrieval
– Comparing E5, bge-large, mpnet, SFR, MiniLM for semantic search tasks
Context Compression for Long-Context LLMs
– Implementing summarization + reranking + filtering pipelines
What I need advice on:
Which topic gives the best job-market advantage?
Which one is realistically doable in 2 months by one person?
Which topic has the strongest open-source ecosystem, with no need for GPT-4?
Which topic has the best potential for a strong research paper?
Any suggestions or personal experience would be really appreciated!
Thanks