GitHub Pages로 개인 블로그 만들기: Claude와 함께하는 자동화

개발자에게 블로그는 기술적 성장을 기록하는 가장 좋은 방법 중 하나다. GitHub Pages는 무료 호스팅, Git 기반 워크플로우, 마크다운 지원이라는 세 가지 장점을 모두 갖추고 있다. 여기에 Claude를 더하면 블로그 구축부터 포스트 작성까지 대부분의 과정을 자동화할 수 있다.

왜 GitHub Pages인가

블로그 플랫폼은 많지만, 개발자가 GitHub Pages를 선택해야 하는 이유는 명확하다.

항목	GitHub Pages	WordPress	Notion
비용	무료	호스팅 비용 발생	무료 (제한적)
커스터마이징	완전한 제어	테마/플러그인 의존	제한적
버전 관리	Git 내장	별도 백업 필요	히스토리 제한
배포	`git push`로 자동	FTP/대시보드	즉시 반영
마크다운	네이티브 지원	플러그인 필요	부분 지원
커스텀 도메인	무료 지원	유료 플랜	유료 플랜

핵심은 코드처럼 블로그를 관리할 수 있다는 점이다. 포스트 하나를 쓰는 것도 커밋이고, 디자인을 바꾸는 것도 PR이다.

Step 1: GitHub 저장소 생성

GitHub Pages는 username.github.io 형태의 저장소 이름을 사용한다.

# GitHub에서 새 저장소 생성 후 클론
git clone https://github.com/username/username.github.io.git
cd username.github.io

저장소 이름이 username.github.io이면 GitHub가 자동으로 Pages 사이트로 인식한다. 별도의 설정 없이 https://username.github.io로 접속할 수 있다.

Step 2: Jekyll 설치 및 초기화

Jekyll은 GitHub Pages의 공식 정적 사이트 생성기(Static Site Generator)다.

Ruby 및 Jekyll 설치

# macOS (Homebrew)
brew install ruby
gem install jekyll bundler

# Ubuntu/Debian
sudo apt install ruby-full build-essential
gem install jekyll bundler

새 Jekyll 사이트 생성

jekyll new . --force

이 명령어는 기본 디렉토리 구조를 생성한다:

username.github.io/
├── _posts/           # 블로그 포스트
├── _layouts/         # HTML 레이아웃 템플릿
├── _includes/        # 재사용 가능한 HTML 조각
├── _config.yml       # Jekyll 설정 파일
├── assets/           # CSS, JS, 이미지
├── index.html        # 홈페이지
├── Gemfile           # Ruby 의존성
└── 404.html          # 404 페이지

로컬 서버 실행

bundle exec jekyll serve

브라우저에서 http://localhost:4000으로 확인할 수 있다. --livereload 옵션을 추가하면 파일 변경 시 브라우저가 자동 새로고침된다.

bundle exec jekyll serve --livereload

Step 3: _config.yml 설정

_config.yml은 사이트의 모든 설정을 담고 있다. 기본 설정을 자신에 맞게 수정한다.

title: "Dev & Life"
description: "A developer's journey"
author: "Hong Gildong"
url: "https://gildong.github.io"
baseurl: ""

# 마크다운 처리
markdown: kramdown
highlighter: rouge

# 퍼마링크 구조
permalink: /:year/:month/:day/:title/

# 페이지네이션
plugins:
  - jekyll-paginate
  - jekyll-seo-tag
  - jekyll-sitemap

paginate: 6
paginate_path: "/page/:num/"

# 빌드 제외
exclude:
  - Gemfile
  - Gemfile.lock
  - README.md
  - node_modules

주요 설정 항목:

title / description: 사이트 제목과 설명. SEO 메타 태그에도 사용된다.
permalink: URL 구조를 결정한다. 날짜를 포함하면 포스트별 고유 URL이 보장된다.
plugins: jekyll-seo-tag는 메타 태그를 자동 생성하고, jekyll-sitemap은 검색 엔진용 사이트맵을 만들어준다.
paginate: 홈페이지에 표시할 포스트 수를 결정한다.

Step 4: 첫 번째 포스트 작성

Jekyll 포스트는 _posts/ 디렉토리에 YYYY-MM-DD-title.md 형식으로 저장한다.

---
layout: post
title: "첫 번째 포스트"
date: 2026-02-18
tags: [blog, jekyll, github-pages]
---

## 블로그를 시작하며

이 블로그는 GitHub Pages와 Jekyll로 만들었다.
마크다운으로 글을 쓰고, `git push`로 배포한다.

### 코드도 깔끔하게

​```python
print("Hello, Blog!")
​```

**볼드**, *이탤릭*, `인라인 코드` 모두 지원된다.

Frontmatter (YAML 머리말)는 포스트의 메타데이터를 정의한다. layout, title, date는 필수 필드이며, tags, categories, excerpt 등 다양한 선택 필드를 추가할 수 있다. Jekyll 공식 문서에서 전체 Frontmatter 옵션을 확인할 수 있다.

Step 5: 테마 커스터마이징

Jekyll의 기본 테마 minima도 괜찮지만, 직접 커스터마이징하면 더 나은 결과를 얻을 수 있다.

레이아웃 구조

Jekyll은 Liquid 템플릿 엔진을 사용한다. _layouts/ 디렉토리에 HTML 템플릿을 정의하면 모든 페이지에 일관된 구조를 적용할 수 있다.

<!-- _layouts/default.html -->
<!DOCTYPE html>
<html lang="ko">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>GitHub Pages로 개인 블로그 만들기: Claude와 함께하는 자동화 | 놀놀할할</title>
  <link rel="stylesheet" href="/assets/css/style.css">
  <!-- Begin Jekyll SEO tag v2.8.0 -->
<title>GitHub Pages로 개인 블로그 만들기: Claude와 함께하는 자동화 | 놀놀할할</title>
<meta name="generator" content="Jekyll v4.4.1" />
<meta property="og:title" content="GitHub Pages로 개인 블로그 만들기: Claude와 함께하는 자동화" />
<meta name="author" content="Sungwon" />
<meta property="og:locale" content="en_US" />
<meta name="description" content="개발자에게 블로그는 기술적 성장을 기록하는 가장 좋은 방법 중 하나다. GitHub Pages는 무료 호스팅, Git 기반 워크플로우, 마크다운 지원이라는 세 가지 장점을 모두 갖추고 있다. 여기에 Claude를 더하면 블로그 구축부터 포스트 작성까지 대부분의 과정을 자동화할 수 있다." />
<meta property="og:description" content="개발자에게 블로그는 기술적 성장을 기록하는 가장 좋은 방법 중 하나다. GitHub Pages는 무료 호스팅, Git 기반 워크플로우, 마크다운 지원이라는 세 가지 장점을 모두 갖추고 있다. 여기에 Claude를 더하면 블로그 구축부터 포스트 작성까지 대부분의 과정을 자동화할 수 있다." />
<link rel="canonical" href="https://jsungwon.github.io/tech/2026/02/18/github-pages-blog-with-claude/" />
<meta property="og:url" content="https://jsungwon.github.io/tech/2026/02/18/github-pages-blog-with-claude/" />
<meta property="og:site_name" content="놀놀할할" />
<meta property="og:type" content="article" />
<meta property="article:published_time" content="2026-02-18T00:00:00+00:00" />
<meta name="twitter:card" content="summary" />
<meta property="twitter:title" content="GitHub Pages로 개인 블로그 만들기: Claude와 함께하는 자동화" />
<script type="application/ld+json">
{"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"Sungwon"},"dateModified":"2026-02-18T00:00:00+00:00","datePublished":"2026-02-18T00:00:00+00:00","description":"개발자에게 블로그는 기술적 성장을 기록하는 가장 좋은 방법 중 하나다. GitHub Pages는 무료 호스팅, Git 기반 워크플로우, 마크다운 지원이라는 세 가지 장점을 모두 갖추고 있다. 여기에 Claude를 더하면 블로그 구축부터 포스트 작성까지 대부분의 과정을 자동화할 수 있다.","headline":"GitHub Pages로 개인 블로그 만들기: Claude와 함께하는 자동화","mainEntityOfPage":{"@type":"WebPage","@id":"https://jsungwon.github.io/tech/2026/02/18/github-pages-blog-with-claude/"},"url":"https://jsungwon.github.io/tech/2026/02/18/github-pages-blog-with-claude/"}</script>
<!-- End Jekyll SEO tag -->

</head>
<body>
  <header>
    <nav>
      <a href="/">놀놀할할</a>
      <a href="/about">About</a>
    </nav>
  </header>
  <main><article class="post-full">
  <header class="post-header">
    <div class="post-meta">
      <span class="post-category post-category--research">
        🧬 Research
      </span>
      <time datetime="2026-02-17T00:00:00+00:00">
        2026년 02월 17일
      </time>
    </div>
    <h1 class="post-title">scDiffusion: 확산 모델을 활용한 고품질 단일세포 데이터 생성</h1>
    
    <p class="post-subtitle">Bioinformatics (2024) — 잠재 확산 모델과 파운데이션 모델의 결합으로 조건부 scRNA-seq 데이터 생성</p>
    
    
    <div class="post-tags">
      
      <span class="tag">#paper review</span>
      
      <span class="tag">#single-cell</span>
      
      <span class="tag">#diffusion model</span>
      
      <span class="tag">#deep learning</span>
      
      <span class="tag">#generative model</span>
      
    </div>
    
  </header>

  

  <div class="post-content">
    <blockquote>
  <p><strong>논문 정보</strong></p>
  <ul>
    <li><strong>Title</strong>: scDiffusion: conditional generation of high-quality single-cell data using diffusion model</li>
    <li><strong>Authors</strong>: Erpai Luo, Minsheng Hao, Lei Wei, Xuegong Zhang</li>
    <li><strong>Affiliation</strong>: MOE Key Lab of Bioinformatics, Tsinghua University, Beijing</li>
    <li><strong>Journal</strong>: Bioinformatics, Volume 40, Issue 9, btae518 (2024)</li>
    <li><strong>DOI</strong>: <a href="https://doi.org/10.1093/bioinformatics/btae518">10.1093/bioinformatics/btae518</a></li>
  </ul>
</blockquote>

<hr />

<h2 id="연구-배경-및-동기">연구 배경 및 동기</h2>

<p>단일세포 RNA 시퀀싱(scRNA-seq)은 세포 수준에서 생명 현상을 연구하는 데 핵심적인 기술이다. 그러나 충분한 양의 고품질 scRNA-seq 데이터를 확보하는 것은 여전히 어려운 과제다. 실험 비용이 높고, 특히 <strong>희귀 세포 유형(rare cell type)</strong>의 경우 충분한 수의 세포를 포착하기 어렵다.</p>

<p>이 문제를 해결하기 위해 <strong>생성 모델(generative model)</strong>을 활용하여 합성 scRNA-seq 데이터를 생성하려는 시도가 있어왔다. 기존 방법으로는 scGAN(GAN 기반), scDesign3(통계 학습 기반), SPARSim, SCRIP 등이 있으나, 생성된 데이터의 현실성이 부족하고, 특히 <strong>조건부 생성(conditional generation)</strong> — 특정 세포 유형이나 조직에 맞춰 데이터를 생성하는 것 — 에서 한계를 보였다.</p>

<p>한편, 이미지 생성 분야에서는 <strong>확산 모델(diffusion model)</strong>이 높은 충실도(fidelity)의 데이터 생성에서 압도적인 성능을 보여주고 있었다. 저자들은 이 확산 모델의 장점을 scRNA-seq 데이터 생성에 접목하여, <strong>scDiffusion</strong>이라는 새로운 생성 모델을 제안한다.</p>

<hr />

<h2 id="모델-아키텍처">모델 아키텍처</h2>

<p><img src="https://github.com/EperLuo/scDiffusion/raw/main/model_archi.png" alt="scDiffusion 모델 아키텍처 — 오토인코더(SCimilarity), 디노이징 네트워크, 조건부 분류기로 구성된 잠재 확산 모델" />
<em>Figure 1: scDiffusion의 전체 아키텍처 개요. 학습 단계에서 SCimilarity 인코더가 유전자 발현을 잠재 공간으로 변환하고, 확산 과정을 통해 노이즈를 추가한 뒤 디노이징 네트워크가 역방향 과정을 학습한다. 추론 단계에서는 가우시안 노이즈로부터 새로운 세포 임베딩을 생성한다. (출처: Luo et al., Bioinformatics 2024)</em></p>

<p>scDiffusion은 <strong>잠재 확산 모델(Latent Diffusion Model, LDM)</strong>과 <strong>파운데이션 모델(foundation model)</strong>을 결합한 구조로, 세 가지 핵심 모듈로 구성된다:</p>

<h3 id="1-오토인코더-autoencoder--scimilarity">1. 오토인코더 (Autoencoder) — SCimilarity</h3>

<ul>
  <li>사전 학습된 파운데이션 모델 <strong>SCimilarity</strong>를 오토인코더로 활용</li>
  <li>SCimilarity는 <strong>399개의 scRNA-seq 연구에서 수집한 2,270만 개 세포</strong>로 사전 학습된 인코더-디코더 네트워크</li>
  <li>유전자 발현 프로파일을 <strong>128차원 잠재 공간(latent space)</strong>으로 압축</li>
  <li>원시 분포를 <strong>가우시안 유사 분포</strong>로 변환하여 확산 과정에 적합한 형태로 정규화</li>
  <li>사전 학습 가중치를 기반으로 미세 조정(fine-tuning)하여, 처음부터 학습하는 것 대비 더 빠르고 우수한 성능 달성</li>
</ul>

<h3 id="2-디노이징-네트워크-denoising-network">2. 디노이징 네트워크 (Denoising Network)</h3>

<ul>
  <li>유전자 발현 데이터의 특성(긴 길이, 희소성, 비정렬)을 고려한 새로운 아키텍처 설계</li>
  <li>CNN이나 Transformer 대신 <strong>스킵 연결 다층 퍼셉트론(skip-connected MLP)</strong> 기반 구조 채택</li>
  <li>스킵 연결 구조는 다양한 레벨의 특징을 보존하고 정보 손실을 줄임</li>
  <li>확산 과정의 <strong>역방향 과정(reverse process)</strong>을 학습하여 노이즈를 제거하고 의미있는 임베딩을 복원</li>
</ul>

<h3 id="3-조건부-제어기-condition-controller--분류기-가이던스">3. 조건부 제어기 (Condition Controller) — 분류기 가이던스</h3>

<ul>
  <li>세포 유형 분류기(classifier)를 통한 <strong>분류기 가이던스(classifier guidance)</strong> 방식 적용</li>
  <li>분류기는 4층 MLP 구조로, 세포 유형이나 조직 유형 등의 조건 레이블을 예측</li>
  <li>디노이징 네트워크의 학습에 간섭하지 않고 <strong>별도로 학습</strong>되어 그래디언트를 제공</li>
  <li><strong>다중 분류기</strong>를 동시에 사용하여 여러 조건 조합으로 데이터 생성 가능</li>
</ul>

<hr />

<h2 id="학습-및-추론-파이프라인">학습 및 추론 파이프라인</h2>

<h3 id="학습-단계-training">학습 단계 (Training)</h3>

<table>
  <thead>
    <tr>
      <th>단계</th>
      <th>내용</th>
      <th>세부 사항</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Step 1</strong></td>
      <td>오토인코더 미세 조정</td>
      <td>SCimilarity 사전 학습 가중치 기반, 첫/마지막 레이어만 재학습</td>
    </tr>
    <tr>
      <td><strong>Step 2</strong></td>
      <td>디노이징 네트워크 학습</td>
      <td>600,000~800,000 스텝, 학습률 어닐링 적용</td>
    </tr>
    <tr>
      <td><strong>Step 3</strong></td>
      <td>분류기 학습</td>
      <td>~200,000 이터레이션, 데이터셋 카테고리 수에 맞춰 설정</td>
    </tr>
  </tbody>
</table>

<p>학습 과정에서 SCimilarity 모델이 유전자 발현 프로파일을 잠재 공간으로 임베딩한 후, 확산 과정을 통해 각 임베딩에 노이즈를 추가하여 노이즈 임베딩 시퀀스를 생성한다. 이 노이즈 임베딩이 디노이징 네트워크의 학습 데이터가 되며, 동시에 분류기는 임베딩으로부터 조건 레이블을 예측하도록 학습된다.</p>

<h3 id="추론-단계-inference">추론 단계 (Inference)</h3>

<ol>
  <li>가우시안 노이즈를 입력으로 받음</li>
  <li>디노이징 네트워크가 <strong>T=1,000 스텝</strong>에 걸쳐 반복적으로 노이즈를 제거</li>
  <li>분류기 가이던스 또는 Gradient Interpolation 전략으로 생성 방향 제어</li>
  <li>생성된 잠재 임베딩을 디코더에 입력하여 최종 유전자 발현 프로파일 복원</li>
</ol>

<hr />

<h2 id="gradient-interpolation-전략">Gradient Interpolation 전략</h2>

<p>scDiffusion의 가장 독창적인 기여 중 하나는 <strong>Gradient Interpolation</strong> 전략이다. 이는 이산적인 세포 상태(discrete cell states)로부터 <strong>연속적인 세포 발달 궤적(continuous developmental trajectory)</strong>을 생성할 수 있는 새로운 제어 전략이다.</p>

<p>기존의 직접 보간(linear interpolation)과 달리, Gradient Interpolation은 확산 모델이 학습한 세포 분포를 활용하여 생물학적으로 더 타당한 중간 상태를 생성한다. 구체적으로:</p>

<ul>
  <li>두 개의 조건(예: 발달 시점 day 3과 day 4.5)을 분류기에 입력</li>
  <li>두 조건 사이의 그래디언트를 보간하여 중간 상태를 연속적으로 생성</li>
  <li>시퀀싱 간격 사이의 빈 구간을 채워 더 포괄적인 발달 타임라인 제공</li>
</ul>

<hr />

<h2 id="실험-데이터셋">실험 데이터셋</h2>

<table>
  <thead>
    <tr>
      <th>데이터셋</th>
      <th>세포 수</th>
      <th>유전자 수</th>
      <th>세포 유형</th>
      <th>용도</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Tabula Muris</strong></td>
      <td>54,865</td>
      <td>23,433</td>
      <td>55</td>
      <td>비조건부/단일조건부 생성</td>
    </tr>
    <tr>
      <td><strong>PBMC68k</strong></td>
      <td>68,579</td>
      <td>~2,000+</td>
      <td>11</td>
      <td>비조건부/단일조건부 생성</td>
    </tr>
    <tr>
      <td><strong>Human Lung PF</strong></td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
      <td>비조건부 생성</td>
    </tr>
    <tr>
      <td><strong>Tabula Sapiens</strong></td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
      <td>다중조건부 생성</td>
    </tr>
    <tr>
      <td><strong>Mouse embryo</strong></td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
      <td>Gradient Interpolation</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="주요-결과">주요 결과</h2>

<h3 id="1-비조건부-생성-unconditional-generation">1. 비조건부 생성 (Unconditional Generation)</h3>

<p>scDiffusion이 조건 없이 생성한 세포 데이터의 품질을 scGAN, scDesign3, SPARSim, SCRIP과 비교했다.</p>

<table>
  <thead>
    <tr>
      <th>메트릭</th>
      <th>scDiffusion</th>
      <th>scGAN</th>
      <th>scDesign3</th>
      <th>SPARSim</th>
      <th>SCRIP</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>SCC</strong> (↑)</td>
      <td><strong>0.984</strong></td>
      <td>유사</td>
      <td>유사</td>
      <td>낮음</td>
      <td>낮음</td>
    </tr>
    <tr>
      <td><strong>MMD</strong> (↓)</td>
      <td><strong>0.018</strong></td>
      <td>유사</td>
      <td>유사</td>
      <td>높음</td>
      <td>높음</td>
    </tr>
    <tr>
      <td><strong>LISI</strong> (↑)</td>
      <td><strong>0.887</strong></td>
      <td>유사</td>
      <td>유사</td>
      <td>낮음</td>
      <td>낮음</td>
    </tr>
    <tr>
      <td><strong>RF AUC</strong> (→0.5)</td>
      <td><strong>0.697</strong></td>
      <td>유사</td>
      <td>유사</td>
      <td>낮음</td>
      <td>낮음</td>
    </tr>
  </tbody>
</table>

<ul>
  <li><strong>SCC (Spearman Correlation Coefficient)</strong>: 실제와 생성 데이터 간 유전자 발현 상관관계. 0.984로 매우 높은 유사도</li>
  <li><strong>MMD (Maximum Mean Discrepancy)</strong>: 분포 간 거리. 0.018로 실제 데이터와 매우 근접</li>
  <li><strong>LISI (Local Inverse Simpson’s Index)</strong>: 실제와 생성 세포의 혼합 정도. 0.887로 높은 혼합도</li>
  <li><strong>RF AUC</strong>: 랜덤 포레스트가 실제/생성 세포를 구분하는 정확도. 0.5에 가까울수록 구분 불가능</li>
</ul>

<p>UMAP 시각화에서도 scDiffusion이 생성한 세포들이 실제 세포들과 잘 겹치는 것을 확인할 수 있었다.</p>

<h3 id="2-단일-조건부-생성-single-conditional-generation">2. 단일 조건부 생성 (Single-Conditional Generation)</h3>

<p>세포 유형 분류기를 가이드로 사용하여 <strong>특정 세포 유형</strong>의 데이터를 조건부로 생성했다.</p>

<ul>
  <li><strong>Tabula Muris</strong>: 세포 유형별로 동일 수의 세포를 조건부 생성. UMAP에서 실제 세포와 시각적으로 중첩</li>
  <li><strong>희귀 세포 유형도 성공적으로 생성</strong>:
    <ul>
      <li>Thymus 세포 (전체의 2.5%): 정확하게 생성</li>
      <li>CD34+ 세포 (전체의 0.4%): 정확하게 생성</li>
    </ul>
  </li>
  <li><strong>KNN 평가</strong>: 모든 세포 유형에서 AUC가 0.5 근처로, KNN 모델이 실제와 생성 세포를 구분하지 못함</li>
  <li><strong>CellTypist 분류</strong>: 생성 세포의 평균 분류 정확도 <strong>0.93</strong> (실제 세포: 0.98)
    <ul>
      <li>비교: scDesign3 생성 세포는 평균 0.99, scGAN 생성 세포는 평균 0.04 (구분 불가)</li>
    </ul>
  </li>
</ul>

<h3 id="3-다중-조건부-생성-multi-conditional-generation">3. 다중 조건부 생성 (Multi-Conditional Generation)</h3>

<p><strong>세포 유형 + 조직 유형</strong> 두 가지 분류기를 동시에 사용하여, <strong>학습 데이터에 없었던 조합</strong>의 세포를 생성하는 실험을 수행했다.</p>

<p>예를 들어, 학습 데이터에 유방(mammary gland)의 B세포는 있었지만, 흉선(thymus)의 기억 B세포(memory B cell)나 비장(spleen)의 대식세포(macrophage)는 없었다. scDiffusion은 이러한 <strong>분포 외(out-of-distribution)</strong> 조합도 성공적으로 생성했다:</p>

<table>
  <thead>
    <tr>
      <th>생성 조건</th>
      <th>CellTypist 분류 정확도 (생성)</th>
      <th>CellTypist 분류 정확도 (실제)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>유방 B세포</td>
      <td>98%</td>
      <td>92%</td>
    </tr>
    <tr>
      <td>흉선 기억 B세포</td>
      <td>96.75%</td>
      <td>96.91%</td>
    </tr>
    <tr>
      <td>비장 대식세포</td>
      <td>96.63%</td>
      <td>99.53%</td>
    </tr>
  </tbody>
</table>

<p>마커 유전자(marker gene) 발현 수준에서도 생성된 세포가 실제 세포와 유사한 패턴을 보였다.</p>

<h3 id="4-gradient-interpolation을-통한-발달-궤적-재구성">4. Gradient Interpolation을 통한 발달 궤적 재구성</h3>

<p>마우스 배아 발달 데이터에서 day 0~8 시점의 세포(day 3.5와 day 4 제외)로 scDiffusion을 학습한 뒤, day 3과 day 4.5 사이의 <strong>중간 발달 상태</strong>를 Gradient Interpolation으로 생성했다.</p>

<table>
  <thead>
    <tr>
      <th>메트릭</th>
      <th>scDiffusion (Gradient Interpolation)</th>
      <th>직접 보간 (Linear Interpolation)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Mean MMD</strong> (↓)</td>
      <td><strong>0.3217</strong></td>
      <td>0.5206</td>
    </tr>
    <tr>
      <td><strong>Mean LISI</strong> (↑)</td>
      <td><strong>0.4488</strong></td>
      <td>0.3217</td>
    </tr>
    <tr>
      <td>State 6 MMD</td>
      <td><strong>0.047</strong></td>
      <td>0.1289</td>
    </tr>
    <tr>
      <td>State 8 MMD</td>
      <td><strong>0.0545</strong></td>
      <td>0.1167</td>
    </tr>
  </tbody>
</table>

<p>scDiffusion은 치료(treatment) 정보 없이 학습되었음에도, 직접 보간보다 유의미하게 우수한 성능을 보였다. 이는 확산 모델이 세포 분포의 다양한 양상을 잘 포착하고, 중간 상태를 적절히 보간할 수 있음을 시사한다.</p>

<hr />

<h2 id="의의-및-한계">의의 및 한계</h2>

<h3 id="의의">의의</h3>

<ol>
  <li><strong>파운데이션 모델 활용</strong>: SCimilarity라는 대규모 사전 학습 모델을 오토인코더로 활용하여, 유전자 발현의 보편적 표현(universal representation)을 확보</li>
  <li><strong>다중 조건부 생성</strong>: 여러 분류기를 동시에 활용하여 학습 데이터에 없던 조건 조합의 세포도 생성 가능</li>
  <li><strong>Gradient Interpolation</strong>: 이산적 발달 시점 사이의 연속적 궤적을 생성하는 독창적 전략</li>
  <li><strong>희귀 세포 유형 생성</strong>: 전체의 0.4%에 불과한 세포 유형도 정확하게 생성 가능</li>
  <li><strong>확장성</strong>: 이론적으로 멀티오믹스(multi-omics) 데이터 등 다른 유형의 단일세포 데이터에도 적용 가능 (후속 연구 scDiffusion-X로 이어짐)</li>
</ol>

<h3 id="한계-및-후속-연구-방향">한계 및 후속 연구 방향</h3>

<ul>
  <li><strong>학습 비용</strong>: 오토인코더, 디노이징 네트워크, 분류기를 순차적으로 학습해야 하며, 디노이징 네트워크만 60~80만 스텝이 필요</li>
  <li><strong>분류기 의존성</strong>: 조건부 생성을 위해 별도의 분류기를 학습해야 하는 점은 한계. 후속 연구인 <strong>cfDiffusion</strong>은 분류기 없는(classifier-free) 가이던스로 이 문제를 해결</li>
  <li><strong>조건 레이블 필요</strong>: 조건부 생성에는 세포 유형 등의 레이블 정보가 필수적이므로, 레이블이 없는 데이터에 대한 적용이 제한적</li>
  <li><strong>평가 지표의 한계</strong>: 생성된 단일세포 데이터의 품질을 종합적으로 평가하기 위한 표준화된 벤치마크가 아직 부족</li>
</ul>

<hr />

<h2 id="개인적-소감">개인적 소감</h2>

<p>scDiffusion은 이미지 생성에서 혁명적인 성과를 보인 확산 모델을 단일세포 유전체학에 접목한 흥미로운 시도다. 특히 인상적인 부분은:</p>

<ol>
  <li>
    <p><strong>파운데이션 모델과 확산 모델의 결합</strong>: 2,270만 개 세포로 학습된 SCimilarity의 표현력과 확산 모델의 생성력을 결합한 전략이 효과적이다. 유전자 발현 데이터의 고차원성과 희소성이라는 고유한 문제를 잠재 공간으로의 매핑을 통해 우아하게 해결했다.</p>
  </li>
  <li>
    <p><strong>OOD 생성 능력</strong>: 학습 데이터에 없는 조건 조합(예: 특정 조직의 특정 세포 유형)을 생성할 수 있다는 점은 실용적 가치가 매우 크다. 실험적으로 얻기 어려운 조건의 세포 데이터를 in silico로 생성할 수 있기 때문이다.</p>
  </li>
  <li>
    <p><strong>Gradient Interpolation</strong>: 발달 생물학 관점에서, 시퀀싱 시점 사이의 공백을 채워 연속적인 발달 궤적을 재구성할 수 있다는 것은 강력한 도구다.</p>
  </li>
</ol>

<p>노화 연구(aging research) 관점에서, scDiffusion과 같은 모델은 노화 과정에서의 세포 유형 변화를 모델링하는 데 활용될 수 있을 것이다. 특히 노화에 따라 감소하는 특정 면역세포나 줄기세포 데이터를 증강(augmentation)하거나, Gradient Interpolation을 통해 노화의 연속적인 전사체 변화 궤적을 재구성하는 데 적용할 수 있는 가능성이 있다.</p>

<hr />

<p><strong>References</strong></p>
<ul>
  <li>Luo, E., Hao, M., Wei, L. &amp; Zhang, X. scDiffusion: conditional generation of high-quality single-cell data using diffusion model. <em>Bioinformatics</em> <strong>40</strong>, btae518 (2024). <a href="https://doi.org/10.1093/bioinformatics/btae518">https://doi.org/10.1093/bioinformatics/btae518</a></li>
  <li>
    <table>
      <tbody>
        <tr>
          <td><a href="https://pubmed.ncbi.nlm.nih.gov/39171840/">PubMed</a></td>
          <td><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11368386/">PMC Full Text</a></td>
          <td><a href="https://github.com/EperLuo/scDiffusion">GitHub</a></td>
        </tr>
      </tbody>
    </table>
  </li>
</ul>

  </div>

  <footer class="post-footer">
    <div class="post-nav">
      
      <a class="post-nav__prev" href="/tech/2026/02/16/claude-code-setup-guide/">
        <span class="post-nav__label">← 이전 글</span>
        <span class="post-nav__title">Claude Code 완벽 셋팅 가이드: 설치부터 실전 활용까지</span>
      </a>
      
      
      <a class="post-nav__next" href="/tech/2026/02/18/github-pages-blog-with-claude/">
        <span class="post-nav__label">다음 글 →</span>
        <span class="post-nav__title">GitHub Pages로 개인 블로그 만들기: Claude와 함께하는 자동화</span>
      </a>
      
    </div>
  </footer>
   <!-- 댓글 (giscus) -->
  
   <!-- 댓글 영역 -->
<div class="comments-section">
    <h2 class="comments-title">💬 댓글</h2>
    <script src="https://giscus.app/client.js"
    data-repo="jsungwon/jsungwon.github.io"
    data-repo-id="R_kgDORQmu8A"
    data-category="Announcements"
    data-category-id="DIC_kwDORQmu8M4C2eY7"
    data-mapping="pathname"
    data-strict="0"
    data-reactions-enabled="1"
    data-emit-metadata="0"
    data-input-position="bottom"
    data-theme="preferred_color_scheme"
    data-lang="ko"
    crossorigin="anonymous"
    async>
</script>
  </div>
  
  <style>
    .comments-section {
      max-width: 720px;
      margin: 3rem auto 0;
      padding-top: 2rem;
      border-top: 1px solid #e8e4df;
    }
    .comments-title {
      font-family: 'DM Serif Display', Georgia, serif;
      font-size: 1.3rem;
      margin-bottom: 1.5rem;
      letter-spacing: -0.02em;
    }
  </style>
  
 
</article>
</main>
  <footer>
    <p>&copy; 2026 {"name"=>"Sungwon", "bio"=>"Bioinformatician | Tech Enthusiast | Traveler", "email"=>"jsw0061@gmail.com", "github"=>"jsungwon"}</p>
  </footer>
</body>
</html>

포스트 전용 레이아웃은 default를 상속받아 작성한다:

<!-- _layouts/post.html -->
---
layout: default
---
<article>
  <h1>GitHub Pages로 개인 블로그 만들기: Claude와 함께하는 자동화</h1>
  <time datetime="2026-02-18T00:00:00+00:00">
    2026년 02월 18일
  </time>
  <div class="tags">
    
      <span class="tag">GitHub Pages</span>
    
      <span class="tag">Jekyll</span>
    
      <span class="tag">blog</span>
    
      <span class="tag">Claude</span>
    
      <span class="tag">automation</span>
    
  </div>
  <article class="post-full">
  <header class="post-header">
    <div class="post-meta">
      <span class="post-category post-category--research">
        🧬 Research
      </span>
      <time datetime="2026-02-17T00:00:00+00:00">
        2026년 02월 17일
      </time>
    </div>
    <h1 class="post-title">scDiffusion: 확산 모델을 활용한 고품질 단일세포 데이터 생성</h1>
    
    <p class="post-subtitle">Bioinformatics (2024) — 잠재 확산 모델과 파운데이션 모델의 결합으로 조건부 scRNA-seq 데이터 생성</p>
    
    
    <div class="post-tags">
      
      <span class="tag">#paper review</span>
      
      <span class="tag">#single-cell</span>
      
      <span class="tag">#diffusion model</span>
      
      <span class="tag">#deep learning</span>
      
      <span class="tag">#generative model</span>
      
    </div>
    
  </header>

  

  <div class="post-content">
    <blockquote>
  <p><strong>논문 정보</strong></p>
  <ul>
    <li><strong>Title</strong>: scDiffusion: conditional generation of high-quality single-cell data using diffusion model</li>
    <li><strong>Authors</strong>: Erpai Luo, Minsheng Hao, Lei Wei, Xuegong Zhang</li>
    <li><strong>Affiliation</strong>: MOE Key Lab of Bioinformatics, Tsinghua University, Beijing</li>
    <li><strong>Journal</strong>: Bioinformatics, Volume 40, Issue 9, btae518 (2024)</li>
    <li><strong>DOI</strong>: <a href="https://doi.org/10.1093/bioinformatics/btae518">10.1093/bioinformatics/btae518</a></li>
  </ul>
</blockquote>

<hr />

<h2 id="연구-배경-및-동기">연구 배경 및 동기</h2>

<p>단일세포 RNA 시퀀싱(scRNA-seq)은 세포 수준에서 생명 현상을 연구하는 데 핵심적인 기술이다. 그러나 충분한 양의 고품질 scRNA-seq 데이터를 확보하는 것은 여전히 어려운 과제다. 실험 비용이 높고, 특히 <strong>희귀 세포 유형(rare cell type)</strong>의 경우 충분한 수의 세포를 포착하기 어렵다.</p>

<p>이 문제를 해결하기 위해 <strong>생성 모델(generative model)</strong>을 활용하여 합성 scRNA-seq 데이터를 생성하려는 시도가 있어왔다. 기존 방법으로는 scGAN(GAN 기반), scDesign3(통계 학습 기반), SPARSim, SCRIP 등이 있으나, 생성된 데이터의 현실성이 부족하고, 특히 <strong>조건부 생성(conditional generation)</strong> — 특정 세포 유형이나 조직에 맞춰 데이터를 생성하는 것 — 에서 한계를 보였다.</p>

<p>한편, 이미지 생성 분야에서는 <strong>확산 모델(diffusion model)</strong>이 높은 충실도(fidelity)의 데이터 생성에서 압도적인 성능을 보여주고 있었다. 저자들은 이 확산 모델의 장점을 scRNA-seq 데이터 생성에 접목하여, <strong>scDiffusion</strong>이라는 새로운 생성 모델을 제안한다.</p>

<hr />

<h2 id="모델-아키텍처">모델 아키텍처</h2>

<p><img src="https://github.com/EperLuo/scDiffusion/raw/main/model_archi.png" alt="scDiffusion 모델 아키텍처 — 오토인코더(SCimilarity), 디노이징 네트워크, 조건부 분류기로 구성된 잠재 확산 모델" />
<em>Figure 1: scDiffusion의 전체 아키텍처 개요. 학습 단계에서 SCimilarity 인코더가 유전자 발현을 잠재 공간으로 변환하고, 확산 과정을 통해 노이즈를 추가한 뒤 디노이징 네트워크가 역방향 과정을 학습한다. 추론 단계에서는 가우시안 노이즈로부터 새로운 세포 임베딩을 생성한다. (출처: Luo et al., Bioinformatics 2024)</em></p>

<p>scDiffusion은 <strong>잠재 확산 모델(Latent Diffusion Model, LDM)</strong>과 <strong>파운데이션 모델(foundation model)</strong>을 결합한 구조로, 세 가지 핵심 모듈로 구성된다:</p>

<h3 id="1-오토인코더-autoencoder--scimilarity">1. 오토인코더 (Autoencoder) — SCimilarity</h3>

<ul>
  <li>사전 학습된 파운데이션 모델 <strong>SCimilarity</strong>를 오토인코더로 활용</li>
  <li>SCimilarity는 <strong>399개의 scRNA-seq 연구에서 수집한 2,270만 개 세포</strong>로 사전 학습된 인코더-디코더 네트워크</li>
  <li>유전자 발현 프로파일을 <strong>128차원 잠재 공간(latent space)</strong>으로 압축</li>
  <li>원시 분포를 <strong>가우시안 유사 분포</strong>로 변환하여 확산 과정에 적합한 형태로 정규화</li>
  <li>사전 학습 가중치를 기반으로 미세 조정(fine-tuning)하여, 처음부터 학습하는 것 대비 더 빠르고 우수한 성능 달성</li>
</ul>

<h3 id="2-디노이징-네트워크-denoising-network">2. 디노이징 네트워크 (Denoising Network)</h3>

<ul>
  <li>유전자 발현 데이터의 특성(긴 길이, 희소성, 비정렬)을 고려한 새로운 아키텍처 설계</li>
  <li>CNN이나 Transformer 대신 <strong>스킵 연결 다층 퍼셉트론(skip-connected MLP)</strong> 기반 구조 채택</li>
  <li>스킵 연결 구조는 다양한 레벨의 특징을 보존하고 정보 손실을 줄임</li>
  <li>확산 과정의 <strong>역방향 과정(reverse process)</strong>을 학습하여 노이즈를 제거하고 의미있는 임베딩을 복원</li>
</ul>

<h3 id="3-조건부-제어기-condition-controller--분류기-가이던스">3. 조건부 제어기 (Condition Controller) — 분류기 가이던스</h3>

<ul>
  <li>세포 유형 분류기(classifier)를 통한 <strong>분류기 가이던스(classifier guidance)</strong> 방식 적용</li>
  <li>분류기는 4층 MLP 구조로, 세포 유형이나 조직 유형 등의 조건 레이블을 예측</li>
  <li>디노이징 네트워크의 학습에 간섭하지 않고 <strong>별도로 학습</strong>되어 그래디언트를 제공</li>
  <li><strong>다중 분류기</strong>를 동시에 사용하여 여러 조건 조합으로 데이터 생성 가능</li>
</ul>

<hr />

<h2 id="학습-및-추론-파이프라인">학습 및 추론 파이프라인</h2>

<h3 id="학습-단계-training">학습 단계 (Training)</h3>

<table>
  <thead>
    <tr>
      <th>단계</th>
      <th>내용</th>
      <th>세부 사항</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Step 1</strong></td>
      <td>오토인코더 미세 조정</td>
      <td>SCimilarity 사전 학습 가중치 기반, 첫/마지막 레이어만 재학습</td>
    </tr>
    <tr>
      <td><strong>Step 2</strong></td>
      <td>디노이징 네트워크 학습</td>
      <td>600,000~800,000 스텝, 학습률 어닐링 적용</td>
    </tr>
    <tr>
      <td><strong>Step 3</strong></td>
      <td>분류기 학습</td>
      <td>~200,000 이터레이션, 데이터셋 카테고리 수에 맞춰 설정</td>
    </tr>
  </tbody>
</table>

<p>학습 과정에서 SCimilarity 모델이 유전자 발현 프로파일을 잠재 공간으로 임베딩한 후, 확산 과정을 통해 각 임베딩에 노이즈를 추가하여 노이즈 임베딩 시퀀스를 생성한다. 이 노이즈 임베딩이 디노이징 네트워크의 학습 데이터가 되며, 동시에 분류기는 임베딩으로부터 조건 레이블을 예측하도록 학습된다.</p>

<h3 id="추론-단계-inference">추론 단계 (Inference)</h3>

<ol>
  <li>가우시안 노이즈를 입력으로 받음</li>
  <li>디노이징 네트워크가 <strong>T=1,000 스텝</strong>에 걸쳐 반복적으로 노이즈를 제거</li>
  <li>분류기 가이던스 또는 Gradient Interpolation 전략으로 생성 방향 제어</li>
  <li>생성된 잠재 임베딩을 디코더에 입력하여 최종 유전자 발현 프로파일 복원</li>
</ol>

<hr />

<h2 id="gradient-interpolation-전략">Gradient Interpolation 전략</h2>

<p>scDiffusion의 가장 독창적인 기여 중 하나는 <strong>Gradient Interpolation</strong> 전략이다. 이는 이산적인 세포 상태(discrete cell states)로부터 <strong>연속적인 세포 발달 궤적(continuous developmental trajectory)</strong>을 생성할 수 있는 새로운 제어 전략이다.</p>

<p>기존의 직접 보간(linear interpolation)과 달리, Gradient Interpolation은 확산 모델이 학습한 세포 분포를 활용하여 생물학적으로 더 타당한 중간 상태를 생성한다. 구체적으로:</p>

<ul>
  <li>두 개의 조건(예: 발달 시점 day 3과 day 4.5)을 분류기에 입력</li>
  <li>두 조건 사이의 그래디언트를 보간하여 중간 상태를 연속적으로 생성</li>
  <li>시퀀싱 간격 사이의 빈 구간을 채워 더 포괄적인 발달 타임라인 제공</li>
</ul>

<hr />

<h2 id="실험-데이터셋">실험 데이터셋</h2>

<table>
  <thead>
    <tr>
      <th>데이터셋</th>
      <th>세포 수</th>
      <th>유전자 수</th>
      <th>세포 유형</th>
      <th>용도</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Tabula Muris</strong></td>
      <td>54,865</td>
      <td>23,433</td>
      <td>55</td>
      <td>비조건부/단일조건부 생성</td>
    </tr>
    <tr>
      <td><strong>PBMC68k</strong></td>
      <td>68,579</td>
      <td>~2,000+</td>
      <td>11</td>
      <td>비조건부/단일조건부 생성</td>
    </tr>
    <tr>
      <td><strong>Human Lung PF</strong></td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
      <td>비조건부 생성</td>
    </tr>
    <tr>
      <td><strong>Tabula Sapiens</strong></td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
      <td>다중조건부 생성</td>
    </tr>
    <tr>
      <td><strong>Mouse embryo</strong></td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
      <td>Gradient Interpolation</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="주요-결과">주요 결과</h2>

<h3 id="1-비조건부-생성-unconditional-generation">1. 비조건부 생성 (Unconditional Generation)</h3>

<p>scDiffusion이 조건 없이 생성한 세포 데이터의 품질을 scGAN, scDesign3, SPARSim, SCRIP과 비교했다.</p>

<table>
  <thead>
    <tr>
      <th>메트릭</th>
      <th>scDiffusion</th>
      <th>scGAN</th>
      <th>scDesign3</th>
      <th>SPARSim</th>
      <th>SCRIP</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>SCC</strong> (↑)</td>
      <td><strong>0.984</strong></td>
      <td>유사</td>
      <td>유사</td>
      <td>낮음</td>
      <td>낮음</td>
    </tr>
    <tr>
      <td><strong>MMD</strong> (↓)</td>
      <td><strong>0.018</strong></td>
      <td>유사</td>
      <td>유사</td>
      <td>높음</td>
      <td>높음</td>
    </tr>
    <tr>
      <td><strong>LISI</strong> (↑)</td>
      <td><strong>0.887</strong></td>
      <td>유사</td>
      <td>유사</td>
      <td>낮음</td>
      <td>낮음</td>
    </tr>
    <tr>
      <td><strong>RF AUC</strong> (→0.5)</td>
      <td><strong>0.697</strong></td>
      <td>유사</td>
      <td>유사</td>
      <td>낮음</td>
      <td>낮음</td>
    </tr>
  </tbody>
</table>

<ul>
  <li><strong>SCC (Spearman Correlation Coefficient)</strong>: 실제와 생성 데이터 간 유전자 발현 상관관계. 0.984로 매우 높은 유사도</li>
  <li><strong>MMD (Maximum Mean Discrepancy)</strong>: 분포 간 거리. 0.018로 실제 데이터와 매우 근접</li>
  <li><strong>LISI (Local Inverse Simpson’s Index)</strong>: 실제와 생성 세포의 혼합 정도. 0.887로 높은 혼합도</li>
  <li><strong>RF AUC</strong>: 랜덤 포레스트가 실제/생성 세포를 구분하는 정확도. 0.5에 가까울수록 구분 불가능</li>
</ul>

<p>UMAP 시각화에서도 scDiffusion이 생성한 세포들이 실제 세포들과 잘 겹치는 것을 확인할 수 있었다.</p>

<h3 id="2-단일-조건부-생성-single-conditional-generation">2. 단일 조건부 생성 (Single-Conditional Generation)</h3>

<p>세포 유형 분류기를 가이드로 사용하여 <strong>특정 세포 유형</strong>의 데이터를 조건부로 생성했다.</p>

<ul>
  <li><strong>Tabula Muris</strong>: 세포 유형별로 동일 수의 세포를 조건부 생성. UMAP에서 실제 세포와 시각적으로 중첩</li>
  <li><strong>희귀 세포 유형도 성공적으로 생성</strong>:
    <ul>
      <li>Thymus 세포 (전체의 2.5%): 정확하게 생성</li>
      <li>CD34+ 세포 (전체의 0.4%): 정확하게 생성</li>
    </ul>
  </li>
  <li><strong>KNN 평가</strong>: 모든 세포 유형에서 AUC가 0.5 근처로, KNN 모델이 실제와 생성 세포를 구분하지 못함</li>
  <li><strong>CellTypist 분류</strong>: 생성 세포의 평균 분류 정확도 <strong>0.93</strong> (실제 세포: 0.98)
    <ul>
      <li>비교: scDesign3 생성 세포는 평균 0.99, scGAN 생성 세포는 평균 0.04 (구분 불가)</li>
    </ul>
  </li>
</ul>

<h3 id="3-다중-조건부-생성-multi-conditional-generation">3. 다중 조건부 생성 (Multi-Conditional Generation)</h3>

<p><strong>세포 유형 + 조직 유형</strong> 두 가지 분류기를 동시에 사용하여, <strong>학습 데이터에 없었던 조합</strong>의 세포를 생성하는 실험을 수행했다.</p>

<p>예를 들어, 학습 데이터에 유방(mammary gland)의 B세포는 있었지만, 흉선(thymus)의 기억 B세포(memory B cell)나 비장(spleen)의 대식세포(macrophage)는 없었다. scDiffusion은 이러한 <strong>분포 외(out-of-distribution)</strong> 조합도 성공적으로 생성했다:</p>

<table>
  <thead>
    <tr>
      <th>생성 조건</th>
      <th>CellTypist 분류 정확도 (생성)</th>
      <th>CellTypist 분류 정확도 (실제)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>유방 B세포</td>
      <td>98%</td>
      <td>92%</td>
    </tr>
    <tr>
      <td>흉선 기억 B세포</td>
      <td>96.75%</td>
      <td>96.91%</td>
    </tr>
    <tr>
      <td>비장 대식세포</td>
      <td>96.63%</td>
      <td>99.53%</td>
    </tr>
  </tbody>
</table>

<p>마커 유전자(marker gene) 발현 수준에서도 생성된 세포가 실제 세포와 유사한 패턴을 보였다.</p>

<h3 id="4-gradient-interpolation을-통한-발달-궤적-재구성">4. Gradient Interpolation을 통한 발달 궤적 재구성</h3>

<p>마우스 배아 발달 데이터에서 day 0~8 시점의 세포(day 3.5와 day 4 제외)로 scDiffusion을 학습한 뒤, day 3과 day 4.5 사이의 <strong>중간 발달 상태</strong>를 Gradient Interpolation으로 생성했다.</p>

<table>
  <thead>
    <tr>
      <th>메트릭</th>
      <th>scDiffusion (Gradient Interpolation)</th>
      <th>직접 보간 (Linear Interpolation)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Mean MMD</strong> (↓)</td>
      <td><strong>0.3217</strong></td>
      <td>0.5206</td>
    </tr>
    <tr>
      <td><strong>Mean LISI</strong> (↑)</td>
      <td><strong>0.4488</strong></td>
      <td>0.3217</td>
    </tr>
    <tr>
      <td>State 6 MMD</td>
      <td><strong>0.047</strong></td>
      <td>0.1289</td>
    </tr>
    <tr>
      <td>State 8 MMD</td>
      <td><strong>0.0545</strong></td>
      <td>0.1167</td>
    </tr>
  </tbody>
</table>

<p>scDiffusion은 치료(treatment) 정보 없이 학습되었음에도, 직접 보간보다 유의미하게 우수한 성능을 보였다. 이는 확산 모델이 세포 분포의 다양한 양상을 잘 포착하고, 중간 상태를 적절히 보간할 수 있음을 시사한다.</p>

<hr />

<h2 id="의의-및-한계">의의 및 한계</h2>

<h3 id="의의">의의</h3>

<ol>
  <li><strong>파운데이션 모델 활용</strong>: SCimilarity라는 대규모 사전 학습 모델을 오토인코더로 활용하여, 유전자 발현의 보편적 표현(universal representation)을 확보</li>
  <li><strong>다중 조건부 생성</strong>: 여러 분류기를 동시에 활용하여 학습 데이터에 없던 조건 조합의 세포도 생성 가능</li>
  <li><strong>Gradient Interpolation</strong>: 이산적 발달 시점 사이의 연속적 궤적을 생성하는 독창적 전략</li>
  <li><strong>희귀 세포 유형 생성</strong>: 전체의 0.4%에 불과한 세포 유형도 정확하게 생성 가능</li>
  <li><strong>확장성</strong>: 이론적으로 멀티오믹스(multi-omics) 데이터 등 다른 유형의 단일세포 데이터에도 적용 가능 (후속 연구 scDiffusion-X로 이어짐)</li>
</ol>

<h3 id="한계-및-후속-연구-방향">한계 및 후속 연구 방향</h3>

<ul>
  <li><strong>학습 비용</strong>: 오토인코더, 디노이징 네트워크, 분류기를 순차적으로 학습해야 하며, 디노이징 네트워크만 60~80만 스텝이 필요</li>
  <li><strong>분류기 의존성</strong>: 조건부 생성을 위해 별도의 분류기를 학습해야 하는 점은 한계. 후속 연구인 <strong>cfDiffusion</strong>은 분류기 없는(classifier-free) 가이던스로 이 문제를 해결</li>
  <li><strong>조건 레이블 필요</strong>: 조건부 생성에는 세포 유형 등의 레이블 정보가 필수적이므로, 레이블이 없는 데이터에 대한 적용이 제한적</li>
  <li><strong>평가 지표의 한계</strong>: 생성된 단일세포 데이터의 품질을 종합적으로 평가하기 위한 표준화된 벤치마크가 아직 부족</li>
</ul>

<hr />

<h2 id="개인적-소감">개인적 소감</h2>

<p>scDiffusion은 이미지 생성에서 혁명적인 성과를 보인 확산 모델을 단일세포 유전체학에 접목한 흥미로운 시도다. 특히 인상적인 부분은:</p>

<ol>
  <li>
    <p><strong>파운데이션 모델과 확산 모델의 결합</strong>: 2,270만 개 세포로 학습된 SCimilarity의 표현력과 확산 모델의 생성력을 결합한 전략이 효과적이다. 유전자 발현 데이터의 고차원성과 희소성이라는 고유한 문제를 잠재 공간으로의 매핑을 통해 우아하게 해결했다.</p>
  </li>
  <li>
    <p><strong>OOD 생성 능력</strong>: 학습 데이터에 없는 조건 조합(예: 특정 조직의 특정 세포 유형)을 생성할 수 있다는 점은 실용적 가치가 매우 크다. 실험적으로 얻기 어려운 조건의 세포 데이터를 in silico로 생성할 수 있기 때문이다.</p>
  </li>
  <li>
    <p><strong>Gradient Interpolation</strong>: 발달 생물학 관점에서, 시퀀싱 시점 사이의 공백을 채워 연속적인 발달 궤적을 재구성할 수 있다는 것은 강력한 도구다.</p>
  </li>
</ol>

<p>노화 연구(aging research) 관점에서, scDiffusion과 같은 모델은 노화 과정에서의 세포 유형 변화를 모델링하는 데 활용될 수 있을 것이다. 특히 노화에 따라 감소하는 특정 면역세포나 줄기세포 데이터를 증강(augmentation)하거나, Gradient Interpolation을 통해 노화의 연속적인 전사체 변화 궤적을 재구성하는 데 적용할 수 있는 가능성이 있다.</p>

<hr />

<p><strong>References</strong></p>
<ul>
  <li>Luo, E., Hao, M., Wei, L. &amp; Zhang, X. scDiffusion: conditional generation of high-quality single-cell data using diffusion model. <em>Bioinformatics</em> <strong>40</strong>, btae518 (2024). <a href="https://doi.org/10.1093/bioinformatics/btae518">https://doi.org/10.1093/bioinformatics/btae518</a></li>
  <li>
    <table>
      <tbody>
        <tr>
          <td><a href="https://pubmed.ncbi.nlm.nih.gov/39171840/">PubMed</a></td>
          <td><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11368386/">PMC Full Text</a></td>
          <td><a href="https://github.com/EperLuo/scDiffusion">GitHub</a></td>
        </tr>
      </tbody>
    </table>
  </li>
</ul>

  </div>

  <footer class="post-footer">
    <div class="post-nav">
      
      <a class="post-nav__prev" href="/tech/2026/02/16/claude-code-setup-guide/">
        <span class="post-nav__label">← 이전 글</span>
        <span class="post-nav__title">Claude Code 완벽 셋팅 가이드: 설치부터 실전 활용까지</span>
      </a>
      
      
      <a class="post-nav__next" href="/tech/2026/02/18/github-pages-blog-with-claude/">
        <span class="post-nav__label">다음 글 →</span>
        <span class="post-nav__title">GitHub Pages로 개인 블로그 만들기: Claude와 함께하는 자동화</span>
      </a>
      
    </div>
  </footer>
   <!-- 댓글 (giscus) -->
  
   <!-- 댓글 영역 -->
<div class="comments-section">
    <h2 class="comments-title">💬 댓글</h2>
    <script src="https://giscus.app/client.js"
    data-repo="jsungwon/jsungwon.github.io"
    data-repo-id="R_kgDORQmu8A"
    data-category="Announcements"
    data-category-id="DIC_kwDORQmu8M4C2eY7"
    data-mapping="pathname"
    data-strict="0"
    data-reactions-enabled="1"
    data-emit-metadata="0"
    data-input-position="bottom"
    data-theme="preferred_color_scheme"
    data-lang="ko"
    crossorigin="anonymous"
    async>
</script>
  </div>
  
  <style>
    .comments-section {
      max-width: 720px;
      margin: 3rem auto 0;
      padding-top: 2rem;
      border-top: 1px solid #e8e4df;
    }
    .comments-title {
      font-family: 'DM Serif Display', Georgia, serif;
      font-size: 1.3rem;
      margin-bottom: 1.5rem;
      letter-spacing: -0.02em;
    }
  </style>
  
 
</article>

</article>

CSS 커스터마이징

기본 테마를 걷어내고 직접 스타일링하면 더 깔끔한 결과를 얻을 수 있다.

/* assets/css/style.css */
* { margin: 0; padding: 0; box-sizing: border-box; }

body {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
  color: #333;
  line-height: 1.8;
  max-width: 720px;
  margin: 0 auto;
  padding: 2rem 1.5rem;
}

h1, h2, h3 { margin-top: 2rem; margin-bottom: 0.5rem; }

a { color: #0969da; text-decoration: none; }
a:hover { text-decoration: underline; }

pre {
  background: #f6f8fa;
  border-radius: 6px;
  padding: 1rem;
  overflow-x: auto;
  margin: 1rem 0;
}

code { font-size: 0.9em; }

.tag {
  display: inline-block;
  background: #e8f0fe;
  color: #1a73e8;
  padding: 0.1rem 0.5rem;
  border-radius: 4px;
  font-size: 0.85em;
}

포스트 목록 (홈페이지)

홈페이지에 최신 포스트 목록을 표시하는 방법이다:

<!-- index.html -->
---
layout: default
title: Home
---
<h2>Latest Posts</h2>

  <article>
    <h3><a href="/tech/2026/02/18/github-pages-blog-with-claude/">GitHub Pages로 개인 블로그 만들기: Claude와 함께하는 자동화</a></h3>
    <time>2026-02-18</time>
    <p>개발자에게 블로그는 기술적 성장을 기록하는 가장 좋은 방법 중 하나다. GitHub Pages는 무료 호스팅, Git 기반 워크플로우, 마크다운 지원이라는 세 가지 장점을 모두 갖추고 있다. 여기에 Claude를 더하면 블로그 구축부터...</p>
  </article>

  <article>
    <h3><a href="/research/2026/02/17/scdiffusion-single-cell-data-generation/">scDiffusion: 확산 모델을 활용한 고품질 단일세포 데이터 생성</a></h3>
    <time>2026-02-17</time>
    <p>논문 정보 Title: scDiffusion: conditional generation of high-quality single-cell data using diffusion model Authors: Erpai Luo, Minsheng Hao, Lei Wei, Xuegong Zhang Affiliation: MOE Key Lab of Bioinformatics, Tsinghua University,...</p>
  </article>

  <article>
    <h3><a href="/tech/2026/02/16/claude-code-setup-guide/">Claude Code 완벽 셋팅 가이드: 설치부터 실전 활용까지</a></h3>
    <time>2026-02-16</time>
    <p>Claude Code는 Anthropic이 만든 터미널 기반 AI 코딩 에이전트다. 단순한 코드 자동완성이 아니라, 프로젝트를 이해하고, 파일을 읽고 수정하고, 명령어를 실행하며, Git 워크플로우까지 자동화할 수 있는 강력한 도구다. 이 글에서는 설치부터...</p>
  </article>

  <article>
    <h3><a href="/research/2026/02/15/prnet-transcriptional-response-prediction/">PRnet: 딥 생성 모델을 활용한 화학적 섭동에 대한 전사체 반응 예측</a></h3>
    <time>2026-02-15</time>
    <p>논문 정보 Title: Predicting transcriptional responses to novel chemical perturbations using deep generative model for drug discovery Authors: Xiaoning Qi, Lianhe Zhao, Chenyu Tian, Yueyue Li, Zhen-Lin Chen, Peipei Huo,...</p>
  </article>

  <article>
    <h3><a href="/trip/2025/10/27/seven-sisters/">Seven Sisters: 영국 남부 해안의 백악 절벽을 걷다</a></h3>
    <time>2025-10-27</time>
    <p>하얀 절벽, 푸른 바다
</p>
  </article>

  <article>
    <h3><a href="/trip/2025/10/26/hyde-park-walking/">하이드 파크 산책: 런던 한복판의 고요한 가을 오후</a></h3>
    <time>2025-10-26</time>
    <p>
</p>
  </article>

  <article>
    <h3><a href="/food/2025/10/26/borough-market-paella/">Borough Market의 거대한 빠에야, Bomba Paella</a></h3>
    <time>2025-10-26</time>
    <p>버로우 마켓에서 만난 빠에야
</p>
  </article>

  <article>
    <h3><a href="/trip/2025/09/15/cambridge-walking/">케임브리지 산책: 캠 강변의 여유로운 오후</a></h3>
    <time>2025-09-15</time>
    <p>
</p>
  </article>

site.posts는 날짜 역순으로 정렬된 전체 포스트 목록을 반환한다. limit 필터로 표시 개수를, post.excerpt로 본문 미리보기를 제어할 수 있다.

Step 6: 배포

모든 변경사항을 커밋하고 푸시하면 GitHub Actions가 자동으로 사이트를 빌드하고 배포한다.

git add .
git commit -m "Initial blog setup with Jekyll"
git push origin main

배포 상태는 저장소의 Actions 탭에서 확인할 수 있다. 보통 1~2분 내에 https://username.github.io에서 사이트를 확인할 수 있다.

커스텀 도메인 설정

자신만의 도메인을 사용하고 싶다면:

도메인 구매 후 DNS에 CNAME 레코드 추가:

CNAME    www    username.github.io

저장소 루트에 CNAME 파일 생성:

www.yourdomain.com

GitHub 저장소 Settings → Pages에서 커스텀 도메인 입력 및 HTTPS 활성화

Claude를 활용한 블로그 자동화

여기까지가 기본적인 GitHub Pages 블로그 구축이다. 이제 Claude를 활용해 블로그 운영을 자동화하는 방법을 살펴보자.

Claude Code로 블로그 관리하기

Claude Code는 터미널에서 동작하는 AI 코딩 에이전트다. 블로그 저장소에서 Claude Code를 실행하면 포스트 작성, 코드 수정, 배포까지 한 번에 처리할 수 있다.

# 블로그 디렉토리에서 Claude Code 실행
cd username.github.io
claude

1. CLAUDE.md로 블로그 규칙 정의

프로젝트 루트에 .claude/CLAUDE.md 파일을 만들면 Claude가 블로그의 규칙과 스타일을 자동으로 이해한다.

# Blog Rules

## 포스트 작성 규칙
- 파일명: YYYY-MM-DD-title-in-english.md
- 위치: _posts/ 디렉토리
- Frontmatter 필수 필드: layout, title, date, tags

## 마크다운 스타일
- 제목은 ##부터 시작 (H1은 포스트 제목으로 자동 사용)
- 코드 블록에 언어 태그 필수 (```python, ```bash 등)
- 이미지는 assets/images/ 디렉토리에 저장

## 톤
- 기술 글: 명확하고 간결하게
- 일반 글: 자연스럽고 친근하게

이 파일이 있으면 Claude에게 “새 포스트 써줘”라고만 해도 올바른 형식과 스타일로 포스트를 생성한다. 프로젝트의 컨벤션을 별도로 설명할 필요 없이, CLAUDE.md를 읽고 자동으로 규칙을 따른다.

2. Custom Slash Command로 포스트 템플릿 자동화

.claude/skills/ 디렉토리에 커스텀 명령어를 정의하면 반복 작업을 자동화할 수 있다.

<!-- .claude/skills/write-post.md -->
---
description: "블로그 포스트 작성"
---

# 블로그 포스트 작성

$ARGUMENTS 주제에 대해 블로그 포스트를 작성하세요.

## 규칙
- _posts/YYYY-MM-DD-title.md 형식으로 파일 생성
- Frontmatter에 layout, title, date, tags 포함
- CLAUDE.md에 정의된 마크다운 스타일 준수
- 코드가 포함된 글은 실행 가능한 예시 코드 포함

Claude Code에서 이렇게 사용한다:

> /write-post Docker Compose로 개발 환경 구축하기

이 명령 하나로 올바른 형식의 포스트 파일이 자동 생성된다. $ARGUMENTS에 사용자가 입력한 텍스트가 전달되므로, 주제만 적으면 나머지는 Claude가 알아서 처리한다.

3. GitHub Actions + Claude로 CI/CD 구축

포스트를 커밋하면 자동으로 빌드, 검증, 배포되는 파이프라인을 만들 수 있다.

# .github/workflows/blog-ci.yml
name: Blog CI/CD

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Ruby
        uses: ruby/setup-ruby@v1
        with:
          ruby-version: '3.2'
          bundler-cache: true

      - name: Build Jekyll site
        run: bundle exec jekyll build

      - name: Check for broken links
        run: |
          gem install html-proofer
          htmlproofer ./_site --disable-external

빌드 실패 시 Claude Code로 에러를 분석하고 수정할 수 있다:

# 빌드 에러 확인 및 수정
claude "Jekyll 빌드가 실패했어. 에러 로그를 분석하고 수정해줘"

4. Claude로 포스트 작성 자동화

Claude Code에서 자연어로 포스트 작성을 요청할 수 있다.

> React 18의 Server Components 개념을 정리하는 포스트 써줘

> 지난주 프로젝트에서 Docker 멀티스테이지 빌드로 이미지 크기를 줄인 경험을 포스트로 작성해줘

Claude는 CLAUDE.md에 정의된 규칙을 따라 스타일에 맞는 포스트를 생성한다. Frontmatter 형식, 파일명, 섹션 구조까지 자동으로 맞춰준다.

5. 기존 포스트 개선

이미 작성된 포스트도 Claude로 개선할 수 있다.

# 오타 및 문법 수정
claude "최근 포스트에서 오타나 어색한 표현 찾아서 수정해줘"

# 포스트 SEO 개선
claude "최근 포스트의 tags를 SEO에 맞게 개선해줘"

# 이미지 alt 텍스트 추가
claude "모든 포스트의 이미지에 적절한 alt 텍스트가 있는지 확인해줘"

6. 블로그 기능 추가 자동화

새로운 기능을 추가할 때도 Claude가 도움을 줄 수 있다.

# 검색 기능 추가
claude "블로그에 클라이언트 사이드 검색 기능을 추가해줘"

# 다크 모드 구현
claude "CSS 변수를 활용한 다크 모드 토글을 구현해줘"

# RSS 피드 설정
claude "Jekyll RSS 피드를 설정해줘"

코드를 직접 작성하지 않아도, Claude가 프로젝트 구조를 파악하고 적절한 위치에 파일을 생성하거나 수정한다.

블로그 운영 팁

SEO 기본 설정

jekyll-seo-tag 플러그인으로 메타 태그 자동 생성
jekyll-sitemap으로 sitemap.xml 자동 생성
포스트마다 subtitle을 작성하여 검색 결과에 설명 노출
이미지에 alt 텍스트 추가

글쓰기 습관

일정한 주기로 포스트 작성 (주 1회라도 꾸준히)
카테고리를 미리 정의하고 균형 있게 작성
코드 예시는 실행 가능하게 작성
TIL(Today I Learned) 형태의 짧은 글도 좋다

성능 최적화

이미지는 WebP 형식으로 변환하여 용량 절감
CSS/JS는 minify 처리
lazy loading으로 이미지 지연 로드
Google PageSpeed Insights로 주기적 점검

마치며

GitHub Pages + Jekyll 조합은 개발자에게 가장 효율적인 블로그 플랫폼이다. Git 워크플로우와 자연스럽게 통합되고, 마크다운으로 빠르게 글을 쓸 수 있으며, 무료다.

여기에 Claude를 더하면 블로그 운영의 반복적인 작업들—포스트 템플릿 생성, Frontmatter 관리, 코드 리뷰, 기능 추가—을 자동화할 수 있다. CLAUDE.md로 블로그 규칙을 정의하고, Custom Slash Command로 자주 쓰는 작업을 등록해두면, 글쓰기에만 집중할 수 있는 환경이 만들어진다.

블로그의 핵심은 결국 꾸준히 쓰는 것이다. 도구와 자동화는 그 과정을 조금 더 수월하게 만들어줄 뿐이다.