Israel Photos
DatasetIsrael Photos Dataset A collection of 369 photographs captured across Israel between 2024 and 2025, with LLM-generated captions and location annotations. The images are sourced from the photographer's Pexels gallery. About This Collection This dataset was deliberately curated to provide a diverse visual representation of Israel, encompassing: Varied locations: From the historic streets of Jerusalem's Old City to Tel Aviv's urban landscape, desert vistas in the Negev, and… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Israel-Photos.
Sample Voice Context Data
DatasetSample Voice Context Data A small synthetic dataset containing LLM-generated context information simulating a job seeker narrating their career trajectory. Purpose This dataset was created to test a voice-to-vector-database RAG pipeline. The workflow being evaluated involves: Voice data (MP3 recordings) transcribed to text Transcriptions reformatted as structured context data Text data upserted into a vector database (Pinecone or Ragie) Retrieval accuracy tested by… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Sample-Voice-Context-Data.
Whisper WPM Test
DatasetWhisper WPM Test Dataset A dataset of audio recordings in various speaking styles and content types, designed for evaluating speech-to-text transcription accuracy and words-per-minute (WPM) analysis. Overview This dataset contains recordings across diverse speaking styles (casual, formal, narrative) and content domains (business, technical, educational). The variety in speaking rates and styles makes it ideal for testing ASR model performance across different speech… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Whisper-WPM-Test.
Tech Sentences For ASR Training
DatasetTechVoice Dataset Work in Progress – This dataset is actively being expanded with new recordings. Dataset Statistics Metric Current Target Progress Duration 38m 43s 5h 0m 0s ██░░░░░░░░░░░░░░░░░░ 12.9% Words 10,412 50,000 ████░░░░░░░░░░░░░░░░ 20.8% Total Recordings: 205 samples Total Characters: 74,312 A specialized speech dataset for fine-tuning Automatic Speech Recognition (ASR) models on technical and developer vocabulary. Contains human-recorded… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Tech-Sentences-For-ASR-Training.
Whisper Fine Tune One Shot Eval
DatasetWhisper Fine-Tuning Evaluation: Local vs Commercial ASR A "back of the envelope" evaluation comparing fine-tuned Whisper models running locally against commercial ASR APIs via Eden AI. The Question Can fine-tuning Whisper achieve measurable WER reductions, even when comparing local inference against cloud-based commercial models? TL;DR Yes. Fine-tuned Whisper Large Turbo running locally achieved 5.84% WER, beating the best commercial API (Assembly at… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Whisper-Fine-Tune-One-Shot-Eval.
English Hebrew Mixed Sentences
DatasetEnglish-Hebrew Mixed Sentences Dataset A dataset of English sentences with Hebrew words and phrases interspersed, designed for speech-to-text training and evaluation for English speakers in Israel. Overview This dataset addresses a common challenge for English-speaking immigrants in Israel: standard speech-to-text (STT) systems struggle to accurately transcribe code-switched speech where Hebrew words are mixed into primarily English sentences. Example: "I need to pick up… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/English-Hebrew-Mixed-Sentences.
Open Router API Pricing Analysis
DatasetOpenRouter API Pricing Analysis Dataset Overview This dataset provides a point-in-time capture of pricing and parameters for LLMs available through the OpenRouter API for inference. Contents Raw Data (raw/) Contains the original data extracted from the OpenRouter API, including: Model pricing (input/output token costs) Model parameters and specifications Computed fields such as output/input token price ratios Enhanced Data (hf-enhanced/)… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Open-Router-API-Pricing-Analysis.
Accidental and Low-Quality Photos Dataset This repository contains unintentional photos from camera rolls - accidental captures, blurry shots, and other non-intentionally captured images. Purpose The intended use case is training an image organization model to automatically distinguish and suggest for deletion non-intentionally captured images. This can help users efficiently clean up their camera rolls by identifying photos that were taken accidentally or are of poor… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/accidental-and-low-quality-photos.
Multimodal Ai Taxonomy
DatasetMultimodal AI Taxonomy A comprehensive, structured taxonomy for mapping multimodal AI model capabilities across input and output modalities. Dataset Description This dataset provides a systematic categorization of multimodal AI capabilities, enabling users to: Navigate the complex landscape of multimodal AI models Filter models by specific input/output modality combinations Understand the nuanced differences between similar models (e.g., image-to-video with/without audio… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/multimodal-ai-taxonomy.
Jerusalem High Rise Development
DatasetJerusalem High-Rise Development Image Dataset Overview This dataset contains 56 photographs documenting high-rise buildings and urban development in Jerusalem, Israel. The images capture the architectural evolution of Jerusalem's modern skyline, featuring contemporary construction, building facades, and urban landscapes. Purpose This dataset has been created and shared for the following purposes: Image fine-tuning and AI training: High-quality architectural… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Jerusalem-High-Rise-Development.
Hebrew Language Signage
DatasetHebrew Language Signage Dataset Overview This dataset contains photographs of Hebrew language text in everyday contexts throughout Israel, with a particular focus on signage displays including street signs, commercial signage, and public information displays. Dataset Details Total Images: 68 Format: PNG Content: Real-world photographs of Hebrew text and signage Language Coverage: Primarily Hebrew, with many signs also containing English and Arabic text… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Hebrew-Language-Signage.
Tel Aviv Pics
DatasetTel Aviv Urban Photography Dataset Dataset Description This dataset contains 53 high-quality photographs of Tel Aviv's urban environment, captured to serve as reference material for game development, 3D world creation, and digital environment design. Dataset Summary Total Images: 53 photographs Location: Tel Aviv, Israel Format: JPG Average Size: ~1MB per image Resolution: High-resolution photographs suitable for texture extraction and reference License:… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Tel-Aviv-Pics.
Jerusalem Streetscapes
DatasetJerusalem Streetscapes Dataset A small image dataset containing photos of the rapidly changing urban landscape of Jerusalem, Israel, captured by day and by night. About This dataset documents the evolving cityscape of Jerusalem through 120 photographs taken between June 2024 and September 2025. Dataset Details Number of Images: 120 Time Period: June 2024 - September 2025 Location: Jerusalem, Israel Coverage: Day and night photography of urban landscapes… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Jerusalem-Streetscapes.
Narcissistic Abuse AI Support Configurations A comprehensive network of AI agent configurations designed to provide support for individuals affected by relationships with personality disordered individuals, particularly those with Cluster B disorders and narcissistic personality disorder. Important Disclaimer These tools are not replacements for professional mental health support. They are intended as adjuncts to professional therapeutic care. Most configurations contain… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Narcissistic-Abuse-Support-Configs.
Jerusalem Public Shelter Dataset - September 2025 This repository contains updated location data for public shelters in the Jerusalem area, populated on September 19th, 2025. The data has been processed and enhanced to support individual preparedness efforts and geolocation applications. Data Source The original data was provided by the Jerusalem Municipality and is available in the source_data folder. This dataset represents the most current information available as… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Jerusalem-Emergency-Shelters-0925.
Code Gen Agents 0925
DatasetCode Generation Agent Network A comprehensive collection of specialized AI agents for code generation, development workflows, and project management. While originally designed for Claude Code, these agent specifications are framework-agnostic and can be adapted to work with any AI code generation platform or multi-agent system. Framework Agnostic Design This repository contains agent specifications that define: Clear role definitions and capabilities Tool requirements… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Code-Gen-Agents-0925.
ISO 3166 4217 Consolidated
DatasetISO-3166 & ISO-4217 "Consolidated" Lookup Dataset This dataset contains a mapping between ISO-3166 (countries) and ISO-4217 (currencies). The objective was to create a single dataset to support everyday workloads in international financial analysis undertaken by "casual" / non-official actors and analysts. Version V1 Compiled by: Daniel Rosehill Date: 03-09 (September) - 2025 Note: geopolitics and the global financial system are clearly dynamic concepts. Much as this… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/ISO-3166-4217-Consolidated.
NVR Entity Recognition Experiment Overview This repository contains a training dataset designed for entity recognition in Network Video Recorder (NVR) applications, specifically focused on newborn safety monitoring. The dataset uses a stuffed animal as a privacy-conscious substitute for actual newborn footage, enabling the development of computer vision models that can identify critical safety scenarios in nursery environments. Purpose The primary goal of this… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/NVR-Entity-Recognition-Experiment.
Long Prompt Experiment
DatasetI conducted this experiment to investigate the impact of prompt structure and optimization on LLM performance, specifically testing whether quality and organization matter more than raw prompt length for complex technical tasks. Research Question For specialized technical tasks, does prompt structure and optimization have a greater impact on output quality than raw prompt length alone? Experiment Design I compared three distinct prompting approaches using Gemini 2.5 Lite… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Long-Prompt-Experiment.
Refactor and HF dataset (including texts): Daniel Rosehill Source data: International Foundation for Valuing Impacts This dataset provides V2 of a refactoring of the Global Value Factor Database (GVFD) by the International Foundation for Valuing Impacts intended to enhance the original dataset for machine readability and integration into data analysis and visualization workloads. The International Foundation for Valuing Impacts (IFVI) produces an (open-source) database called the Global… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Global-Value-Factor-Database-Refactor-V2.
Voice Note Audio
DatasetVoice Notes Dataset Dataset Description This dataset contains real-world voice recordings with transcripts and comprehensive annotations. Dataset Statistics Total Entries: 2 Audio Files: 2 Uncorrected Transcripts: 2 Ground Truth Transcripts: 0 Annotation Files: 2 Export Date: 2025-10-27 Dataset Structure audio/ # Audio recordings (MP3, etc.) ├── 1.mp3 ├── 2.mp3 └── ... transcripts/ ├── uncorrected/ # Original STT… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Voice-Note-Audio.
STT Voice Notes Evals
DatasetSTT Voice Note Evaluation Author: Daniel RosehillDate Created: August 11, 2025Purpose: Comparative evaluation of Speech-to-Text (STT) services for voice note transcription Overview This dataset was created as part of ongoing work developing voice note transcription systems. It contains ground truth transcripts representing typical daily voice notes, recorded to evaluate and compare STT service accuracy across different content types. Speaker Profile: Single speaker… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/STT-Voice-Notes-Evals.
System Prompt Library 030825
DatasetSystem Prompts Dataset - August 2025 Point-in-time export from Daniel Rosehill's system prompt library as of August 3rd, 2025 Overview This repository contains a comprehensive collection of 944 system prompts designed for various AI applications, agent workflows, and conversational AI systems. While many of these prompts now serve as the foundation for more complex agent-based workflows, they continue to provide essential building blocks for AI system design and… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/System-Prompt-Library-030825.
Text Transformation Prompt Library A comprehensive collection of text transformation prompts for reformatting dictated text into various formats, styles, and structures. Quick Links Repository Structure /prompts/ The main collection of text transformation prompts. /prompts/md/ - Markdown format prompts /prompts/json/ - JSON format equivalents of the markdown prompts Prompt Structure Each prompt follows a standardized markdown… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Text-Transformation-Prompts-300525.
System Prompt Library
DatasetMy AI System Prompt Library This repository contains a comprehensive, up-to-date library of system prompts for AI systems and autonomous agents, started on May 27th, 2025. Overview This collection houses 923 system prompts covering a diverse range of AI applications. The prompts include configurations for autonomous agents, simple chatbots, specialized assistants, and various AI-powered tools. This repository serves as a centralized hub for these prompts, maintained… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/System-Prompt-Library.
Pay For Outcomes Instruments
DatasetSocial-Impact-Bond-Data This repository contains a curated, redacted, and standardized data set based on the Government Outcome Labs project at Oxford University (UK). It is the leading international data resource tracking the growth and execution of social impact bonds (SIBs), development impact bonds (DIBs), outcome funds, and other pay-for-success instruments worldwide. Project Purpose The data set supports research and AI-driven policy analysis on innovative… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/pay-for-outcomes-instruments.
Israel Alerting Zones
DatasetIsrael Emergency Alerting Zones Dataset This repository contains a comprehensive list of emergency alerting zones used in Israel by the Home Front Command (Pikud HaOref), compiled on May 9th, 2025. Dataset Description This dataset provides a point-in-time export of the alerting areas used by Israel's Home Front Command for issuing emergency alerts during security situations. The alerting zones are primarily used for missile threat notifications and other emergency… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Israel-Alerting-Zones.
Career Data Context Repo
DatasetHello, Friendly AI Bot! Context Generation Date: 28 / April / 2025 Creation Timestamp: 2025-04-28T18:59:13Z Welcome! If you are able to read and parse this text, then this context data repository is working as intended.You have arrived at a small, modular pool of contextual data designed to provide insight into my career aspirations, professional experience, and work preferences. Refer to the "Context Generation Date" above, or if you are able to parse file metadata, use… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Career-Data-Context-Repo.
Software Wish List Context Data
DatasetHello, Friendly AI Bot! Context Generation Date: 28 / (April) / 2025 If you're reading this, then the context pipeline is working as intended, and you have arrived at a small repository of contextual data intended to provide you with general background context about what I look for in software evaluations. As you probably already know, my name is Daniel. I'm a huge fan of technology. And I frequently find myself looking for software tools. Sometimes I do this for work… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Software-Wish-List-Context-Data.
Corn Training Set
DatasetCorn The Sloth Training Images (Repo 2) This repository is another image collection of images of a stuffed sloth that I am fine tuning for a custom image generation model for this particular character. If anyone else is interested in fine tuning for this character or fine tuning for character avatars generally or wants to use this small image set as training data for another project then .... use is granted in accordance iwth the license terms (the sloth didn't quite understand… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Corn-Training-Set.
Shakespeare GPT (Shakespearean Text Generation Prompts) Welcome to what might be the internet's largest collection of prompts for rewriting text in Shakespearean English! This repository contains a variety of prompts designed to transform modern text into the style of Shakespeare, organized by format and purpose. These prompts can be used with any AI tool that accepts custom instructions. A user interface may be forthcoming for those who feel the need to do this regularly.… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Shakespearean-Text-Transformation-Prompts.
Corn The Sloth
DatasetKeyframe Photos Of ..... A Sloth Plushie This repository contains a small collection of images featuring an adorable plush sloth named Cornelius. These images are part of ongoing experiments conducted in my spare time, aimed at testing the capabilities of photogrammetry tools and digital avatar creation for non-human subjects. If anyone needs a small repository of images for various purposes, such as distinguishing plushies from real animals or humans, you are welcome to utilize… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Corn-The-Sloth.
Text To Image Test Prompts
DatasetText To Image Test Prompt Library A comprehensive collection of evaluation prompts for testing text-to-image AI models across diverse parameters and use cases. Overview This repository contains a structured set of test prompts designed to evaluate the capabilities of text-to-image generation models. Rather than focusing on formal evaluation metrics, these prompts are intended for end users who want to test how well a model might perform for their specific use cases.… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Text-To-Image-Test-Prompts.
General Purpose System Prompts
Dataset🤖 Just A Few ... "General" System Prompts Here is a quandary that those who work with LLMs via API through self-hosted chat interfaces (etc) are familiar with: Without any system prompt at all (at least one visible to the user), the default model behavior feels a little bit flat and lifeless. With a deterministic system prompt, a model effectively becomes an "assistant" (and with context and API actions, a full-fledged agent). I haven't found a word yet for the kind of light… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/General-Purpose-System-Prompts.
Prompt Eng System Prompts
DatasetPrompt Engineering System Prompts A curated collection of system prompts designed to assist with prompt engineering activities across various AI platforms and use cases. Last updated: April 6, 2025 Note: This is an ongoing collection. New system prompts are continuously being added to the library. Feel free to check back for updates. Repository Purpose This repository serves as a comprehensive resource for prompt engineers, AI enthusiasts, and developers who want to:… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Prompt-Eng-System-Prompts.
Email Management System Prompts
DatasetEmail Management System Prompts April 06 2025 A collection of system prompts designed to enhance email productivity, communication, and management. These prompts can be used with various AI assistants to automate and improve email-related tasks. Categories Email Composition Email Template Generator - Creates customizable email templates for various purposes Email Rewriter - Reformats and improves existing email drafts Email Signature Generator - Creates… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Email-Management-System-Prompts.
Data Utils System Prompts
DatasetData Utilities System Prompts This repository contains a collection of system prompts for configuring AI assistance in data-related tasks. These prompts can be used to set up AI assistants for various data operations, analysis, and management tasks. Categories Data Conversion Tools for converting data between different formats (CSV, JSON, natural language, etc.) Database Helpers Assistants for working with different databases (MongoDB, Neo4j… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Data-Utils-System-Prompts.
Geopolitical System Prompts
DatasetGeopolitical Analysis System Prompts A collection of system prompts for AI assistants specialized in geopolitical analysis. These prompts enable AI systems to provide structured analysis, reporting, and insights across various aspects of international relations and geopolitical developments. Repository Structure The prompts are organized into the following categories: regional-analysis/ Specialized prompts for analyzing specific regions and generating… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Geopolitical-System-Prompts.
Career Related System Prompts
DatasetCareer Utilities System Prompts A collection of system prompts for AI assistants focused on career guidance and job search assistance. These prompts are designed to help users navigate various aspects of career development, from resume writing to job searching and professional networking. Repository Structure The prompts are organized into the following categories: career-exploration/: Prompts for exploring career paths, understanding industry trends, and making career… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Career-Related-System-Prompts.
Speech To Text System Prompts 2
DatasetSpeech To Text System Prompt Library This repository provides a collection of system prompts designed to transform and refine text captured using speech-to-text technologies. By passing STT outputs through large language models with these specialized prompts, you can achieve cleaner, more structured, and purpose-specific text formats. 📋 The Idea Here is the basic implementation. I don't pretend that this is the stuff of high AI engineering. But it does create quite… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Speech-To-Text-System-Prompts-2.
Single Prompt Book
DatasetCan AI Write A Book In Just One Prompt? April 09, 2025 The pace of development in AI these days is so fast that it's hard to keep on top of all the latest developments. I've always found it interesting that among all the hotly debated parameters discussed in the most recent SOTA models, the question of how many tokens a model can generate in one continuous output (max output tokmens) seems to be very little discussed. This metric exists independent of the maximum input tokens and… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Single-Prompt-Book.
Writing System Prompts
DatasetWriting-Related System Prompt Collection This is a collection of system prompts derived from my larger collection of system prompts. The commonality here is that these system prompts are intended for assistance related to writing, specifically text reformatting, editing, proofing. This is a partial collection that will continue to hopefully evolve and grow over time. The system prompts are organised into folders representing a common purpose and within each folder each system… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Writing-System-Prompts.
GHG Emissions Data
DatasetGHG Emissions Data Pipeline Description This repository contains a comprehensive pipeline for processing and analyzing greenhouse gas (GHG) emissions data. The pipeline integrates datasets from multiple sources, including Climate TRACE and Our World in Data, to provide insights into global emissions trends. It supports sustainability reporting, emissions tracking, and climate action planning. Dataset Details Sources and Methodologies The pipeline… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/GHG-Emissions-Data.
Ifvi Valuefactors Deriv
Dataset⚠️ DEPRECATED - Dataset Superseded by V2 This refactored IFVI value factor dataset has been supplanted by V2. This dataset tracked the V2 of the IFVI release that was updated in March 2025. The V2 of the refactored analytical dataset tracking the GVFD was released on August 20th, 2025 and is now available at: 🔗 IFVI Global Value Factors Dataset V2 Please use the V2 dataset for all new projects and analysis.
Visit Hugging Face
For the most up-to-date collection of datasets, visit my Hugging Face profile