Skip to content

Latest commit

 

History

History
43 lines (25 loc) · 1.12 KB

README.md

File metadata and controls

43 lines (25 loc) · 1.12 KB

Refact Fork of Linguist

The original library is at https://github.com/github/linguist

This library is used on GitHub.com to detect blob languages, ignore binary or vendored files.

It is used in Refact self hosting server to preprocess source code files, as a part of the fine tuning procedure.

Installation

Linguist is written in Ruby. Ruby is a lightweight language that will not mess up our computer much. On Ubuntu:

sudo apt-get install -y ruby-full ruby-bundler build-essential cmake pkg-config libicu-dev zlib1g-dev libcurl4-openssl-dev libkrb5-dev libssl-dev

Installation:

git clone https://github.com/smallcloudai/linguist
cd linguist
bundle install
rake build_gem

Refact will look for the executable smc-linguist in PATH.

Usage

It's designed for batch mode, it reads stdin and writes to stdout:

echo -e "/etc/timezone\n/etc/lsb-release" | smc-linguist

Criticism

If you take Python file, change its extension to .cpp (C++) linguist will not recognize the file as still being Python.

In other words, it doesn't look into the text itself sufficiently. Maybe a better solution is needed.