git clone ssh://git.bobignou.red:23231/nextvi
Nextvi is a vi/ex editor. It can edit bidirectional UTF-8 text. This branch is my fork of the original with my patches applied.
As of 2022-01-01 Nextvi development has been driven to completion. Be sure to read Q1 in the FAQ section of this file before doing anything else.
Nextvi tends to favor clean implementations over 100% adhering to POSIX like neatvi does. Thus the name "next", to signify that we are thinking outside the box. The notable changes that I am referring to (see below): 56, 60, 61. Some keybinds specified by POSIX were not implemented in original neatvi, nextvi might use them for some features below.
-2: always right-to-left. shape If set (default), performs Arabic/Farsi letter shaping. order If set, reorder characters based on the rules defined in conf.c. hl If set (default), text will be highlighted based on syntax highlighting rules in conf.c. hll If set, highlight current line. ai As in vi(1). aw As in vi(1). ic As in vi(1).
special marks:
the position of the previous change [ the first line of the previous change ] the last line of the previous change
ex commands: :cm[ap][!] [kmap] Without kmap, prints the current keymap name. When kmap is specified, sets the alternate keymap to kmap and, unless ! is given, switches to this keymap. :ft [filetype] Without filetype, prints the current file type. When filetype is specified, sets the file type of the current ex buffer. In nextvi :ft also reloads the highlight ft, which makes it possible to reset dynamic highlights created by options like "hlw".
new key mappings (normal): ^a searches for the word under the cursor. zL, zl, zr, and zR change the value of td option. ze and zf switch to the English and alternate keymap. gu, gU, and g~ switch character case. ^l updates terminal dimensions and redraws the screen.
new key mappings (insert): ^p inserts the contents of the default yank buffer. ^e and ^f switch to the English and alternate keymap.
vi(1)
- for features unspecified refer to the respective page in the POSIX manual.
vi_arg1
or motions as a region.vi_arg1
is specified right before ^7 the buffer will be switched immediately
which also happens to permit numbers > 9.vi_arg1
, in which case it will repeat the operation on next
line.vi_arg1
for
how many lines to repeat said operation. When last operation was 'i' or other
commands that enter insert mode and some text, that text will be placed at
the same offset on N number of lines specified by the vi_arg1
.vi_arg1
will exit the mode. This is a major step up to how navigation
works in vi, it makes it so much easier to use because now you can see where
you are going. These special numbers and their colors can be customized
through conf.c. For complexity reasons the feature does not work correctly
if double width characters are present or text is set in reverse and reordered.syn_reloadft();
)[^;...]
to sort out all the punctuation
chars from words and build a database of words. But in order to take full
advantage out of the completion system, you can change this regex at runtime
using new ex command "ac". For example say we don't want word completion
and we want entire line completion instead, run :ac .
If the regex rule allows inclusion of nonalphanumeric or punctuation characters
you won't be able to retrieve the string like it works with words by default.
Automatically determining the position from which completion starts
inside insert sbuf is ambiguous. More fine tuned control is needed and
can be achieved using new keybind ^z in insert. Use ^z before you type out
search term, this will set start position for completion, such
that the options can be looped over inplace. ^z is a toggle, to disable
it (without exiting insert) press twice in same place. When ^z is set
^u keybind will delete everything until ^z mark first, otherwise ^u
operates normally (deletes everything).
Autocomplete db is persistent throughout all buffers and it also has data
duplication and redundancy checking such that same files can be indexed many
times.
Like ^g used to index file in insert mode, ^y can be used from insert to
clear out the completion db.
Running ex command "ac" with no argument will reset back to the default
word filtering regex. You can find it's string in led.c as a reference.
Using ^b from insert mode will display all possible autocomplete options.
Added new ex option "pac". When enabled the autocomplete options will
be automatically displayed.:inc submodule.*\.c$
Example 2:
Exclude the .git folder filled with useless clutter.
:inc (^[^&&.git]+[^\/]+$)\\|(<optional branch for exceptions here>)
Running "inc" without an arg will disable all filters.\
Commands listed below under (33.) will work everywhere in vi.\
again will dynamically update the
directory info at current working dir.vi_arg1
and the word(s) under the cursor will be placed into prompt,
for example :.,.+5s/\<word\>/
where 5 is vi_arg1
. (see also 49.)xkmap_alt
works, now z + vi_arg1
in
normal mode will switch what keymap ^f key changes, so for example 1 = fa 2 =
ru. If you need some other language kmap just make one yourself. Look into
kmap.h it's really simple. kmap also has digraph support, that list has been
expanded to include non printing characters. Naming resembles ASCII caret
notation, so for example 03 is key ^c to be able to type 03 enter insert, use
^k to go into digraph mode, type then " c" (space+c) Essentially space is
representing typeable ctrl key. The reason this is useful is if you are
using @ key to make macros there has to be a way to type these control
characters. Alternatively, it's possible to type the character literal using
^a + ^(insertkeyhere). In neatvi this was binded to ^v, currently ^v is used
for 48. Nextvi shall keep both, the digraph way and ^a for clarity and better
user reference, no more searching ascii keycodes online if kmap has it in the
clear.vi_arg1
). Regex control chars will be escaped.
vr vt v/ ^] ^p will grab word(s) only if vi_arg1
>= 1 otherwise the keybind
will perform a default cursor independent action.[^&&abc]
where ^&&
is the not+and. This will treat
"abc" as !(a && b && c)
logically. Everything in brackets preceding && will
be processed according to standard negated char exp.vi_arg1
).
Advantage of ^e and ^y over using ^u and ^d is keeping the same vi_col
position.highlight -> func
struct member set. If ft does not provide a spot in hl, the
latter feature will not work on that ft, regardless of ex options being set.[ ]+(.[^ ]+)
since only the capture result for 2nd group matters use
the "grp" like this: :se grp=4 .The number 4 is important, it is calculated
using: grpnum * 2, to get the second group number do: 2 * 2 = 4. The first
group is always 2, 1 * 2 = 2, :se grp=2 gets you back to default group/behavior.vi_arg1
which repeats the
operation N times.vi_arg1
changes the direction of ^n.--help
and --version
command line arguments to show some help messages. - Escapes in regex bracket expressions. This isn't posix but it solves
couple of issues that were bugged previously, like escaping | in ex
substitution command, properly counting number of groups in rset.
- A single slash requires 2 slashes, and so on.
- rset_make() requires for ( to be escaped if used inside [] brackets.
- In ex prompt the only separator is "|" character and there is some
counter intuitive handling of escapes done that has to be explained.
1: to use ex separator input "|"
2: to get "|" into the result string input = "\\|"
3: to get "\|" into the result string input = "\|"
4: to get "\\|" into the result string input = "\\\|"
5: to get "\\\|" into the result string input = "\\\\|"
A careful reader will spot that the rules above don't exactly
follow the well established rule - "2 slashes make one". This is
because to allow any combination of "|" and "\" one would have to
force a "4 slashes make one" rule. Imagine having to type "\\("
to escape "(" instead of "\(" everywhere if we want to be consistent.
The latter sucks, so instead we make this special case for ex prompt
that applies only to "|" and "\". In summary, just remember what clauses
(2:) and (3:) state, for cases where there are lots of slashes the
rule is input minus 1 of the "\" equals the output string. Keep in mind
that various ex functions, for example regex will consume extra slashes
internally if there is more than 1 after another.
If we wanted to match string "\\|" input will be: "\\\\\\|"
"\\\|" will get us "\\|" into regex, but the input
needs 3 more slashes because +1 rule (5:) +2 rule 2 slashes make 1.
Timeline: "\\\\\\|" -> rule(5:) "\\\\\|" -> rule(regex:2 = 1) "\\\|" =
rule(regex:"\|" = "|") "\\|" = this is the literal we wanted to match!
New functionality can be obtained through optional patches provided in the patches branch. If you have a meaningful contribution and would love to be made public the patch can be submitted via email or github pull request.
Q1: What's the best way to learn vi/nextvi?
A1: First ensure you know basic movements hjkl this would suffice.
Start reading vi.c don't worry about the rest of this readme until later.
Running ./vi vi.c use / and n N keybinds for search and look for switch cases.
The keybinds are encoded to be intuitive. Once you find some case that looks
like a keybind read the code, if you don't understand try to reproduce the
keybind to better understand what the code does. You have to do this, if you
omit this step you will never be able to realize the full potential that the
software provides. It's not desirable to live in the dark using this
software for the next 10 years only to find that for example, ^p in insert
mode exists and is very useful. If you can't figure how to use the keybind at
least you would know at the back of your mind that there is something there,
realization will come later. It's better to skim look through the switch
cases than to never even open it. This isn't an excuse, but a deliberate
design goal, where the user reads the code in order to achieve the full
control he/she desires.
LOC as of 2022-02-03:
+--------------+---------------------+
| 569 kmap.h | keymap translation |
| 422 vi.h | definitions/aux |
+--------------+---------------------+
| 579 uc.c | UTF-8 support |
| 332 term.c | low level IO |
| 296 conf.c | hl/ft/td config |
| 643 regex.c | extended RE |
| 590 lbuf.c | file/line buffer |
| 1149 ex.c | ex options/commands |
| 2184 vi.c | normal mode/general |
| 776 led.c | insert mode/output |
| 414 ren.c | positioning/syntax |
| 6963 total | wc -l *.c |
+--------------+---------------------+
The code is devised to be unquestionable. You will be able to read, understand and modify this code faster. Come back to this readme regularly as it documents more advanced behavior.
Q2: What does it mean when I call feature X a macro?
A2: It's the kind of shortcut that does not change the core functionality,
but rather reuses the core functionality. Usually macro features are
implemented in 1 or a few lines of code. Notably, they tend to use function
term_push()
, but it's not required. Because they are macros they may run
suboptimally or not handle every possible edge case. When calling a macro
feature from another macro, the results are pushed back, which means the
macro feature will always execute last, see (69.).
These features are considered a macro: 5. 6. 16. 17. 18. 19. 20. 21. 26. 27. 34. 46. 50. 65. 71.
Q3: Keybind with CTRL does not work?
A3: vi is reading ASCII codes sent by the terminal. Depending on the
keyboard, the ASCII code could be another key combination. It was reported
that "^^" (Ctrl + ^) can be achieved on some system with "^6". If something
doesn't work, have a look at the layout of an american/british keyboard and
try to reproduce the keybind as if you have an american/british keyboard.
Q4: Why nextvi instead of vim?
A4: I prefer customization in source code, Vim is considered harmful.
Q5: Why not distribute as patches, like on suckless.org?
A5: It's hard to maintain. Simply put, there are too many changes
to keep track of if compared to original neatvi.
Q6: Why are keybinds encoded as pure switch cases instead of more suckless.org style keybind function table dispatch?
A6: Because we want small efficient code that is easy to write.
In nextvi many keybinds interoperate so that they can do multiple tasks at
various conditions. Use of goto is encouraged, it is simply impossible to
achieve this behavior in a sensible way otherwise. Suckless code style
philosophy crumples when requirements are as complex as what vi needs to be
able to do. In other words, it depends - but if the standard of C provides
means to implementing things cleaner and faster you should use the smallest
form factor possible. In retrospect, it may be harder to find the
implementation itself, but once you do there is nothing else hidden from you.
The result is unabstracted program control flow that can be easily read and
modified reducing the risks of unforeseen side effects.
Q7: General philosophy?
A7: User is programmer, hacker culture.
In most text editors, flexibility is a minor or irrelevant design goal.
Nextvi is designed to be flexible where the editor adapts to the user needs.
This flexibility is achieved by heavily chaining basic commands and allowing
them to create new ones with completely different functionality. Command
reuse keeps the editor small without infringing on your freedom to quickly
get a good grasp on the code. If you want to customize anything, you should
be able to do it using the only core commands or a mix with some specific C
code for more difficult tasks. Simple and flexible design allows for straight
forward solutions to any problem long term and filters bad inconsistent ideas.
Q8: Something, something - pikevm
A8: Pikevm is a complete rewrite of nextvi's regex engine for the purposes of
getting rid of backtracking and severe performance and memory constraints.
Pikevm guarantees that all regular expressions are computed in constant space
and O(n+k) time where n is size of the string and k is some constant for the
complexity of the regex ie. number of state transitions. It is important to
understand that it does not mean that we run at O(n) linear speed, but rather
the amount of processing time & memory usage is distributed evenly and linearly
throughout the string, the k constant plays a big role. If you are familiar
with radix sort algorithms this follows the same idea.
Q: What are the other benefits?
A: For example, now it is possible to compute a C comment /* n */
where n can
be an infinite number of characters. Of course this extends to every other
valid regular expression.
Q: New features pikevm supports?
A: Additionally, pikevm supports PCRE style non capture group (?:) and lazy
quantifiers like .*?
and .+??
because they were easy to implement and allow
for further regex profiling/optimization.
Q: NFA vs DFA (identify)
A: pikevm = NFA backtrack = DFA
Q: What's wrong with original implementation?
A: Nothing except it being slow and limited. My improved version of Ali's DFA
implementation ran 3.5X faster in any case, however I found a bug with it
where quested "?" nested groups compute wrong submatch results. To fix this
problem, it would require to undo a lot of optimization work already done,
basically going back to how slow Ali's implementation would be. The reason
this was spotted so late was because this kind of regex wasn't used before,
so I never tested it. Other than that I think submatch extraction is correct
on other cases. Pikevm does not have this bug, so it will be used as main
regex engine from now on, unless dfa ever finds a proper fix. Honestly, this
change isn't so surprising, as I was working on pikevm a few months prior, to
favor a superior algorithm.
You can still find that code here (likely with no updates):
github.com/kyx0r/nextvi/tree/dfa_dead
As a downside, NFA simulation looses the DFA property of being able to
quickly short circuit a match, as everything runs linearly and at constant
speed, incurring match time overhead. Well optimized DFA engine can
outperform pikevm, but that is rather rare as they got problems of their own.
For example as independently benchmarked, dfa_dead
runs only 13% faster than
pikevm and that is stretching the limit of what is physically possible on a
table based matcher. Can't cheat mother nature, and if you dare to try she's
unforgiving at best.
Supplementary reading by Russ Cox:
swtch.com/~rsc/regexp/regexp1.html
Edit conf.c to adjust syntax highlighting rules and text direction patterns. To define a new keymap, create a new array in kmap.h, like kmap_fa, and add it to kmaps array in the same header (the first entry of the new array specifies its name). The current keymap may be changed with :cm ex command. When in input mode, ^e activates the English kaymap and ^f switches to the alternate keymap (the last keymap specified with :cm). Edit ex.c to set default ex variables or use EXINIT environment variable as shown under OPTIONS section. Edit vi.c to remap normal mode keybinds and led.c for insert.
:cm[ap][!] [kmap]
Without kmap, prints the current keymap name.
When kmap is specified, sets the alternate keymap to
kmap and, unless ! is given, switches to this keymap.
:ft [filetype]
Without filetype, prints the current file type and
reloads the highlight ft, makes it possible to reset
dynamic highlights created by options like "hlw".
When filetype is specified, sets the file type of the
current ex buffer.
New key mappings:
^a in normal mode: searches for the word under the cursor.
^p in insert mode: inserts the contents of the default yank buffer.
zL, zl, zr, and zR in normal mode: change the value of td option.
^e and ^f in insert mode: switch to the English and alternate keymap.
ze and zf in normal mode: switch to the English and alternate keymap.
gu, gU, and g~ in normal mode: switch character case.
^l in normal mode: updates terminal dimensions (after resizing it).
"Ever tried reading the source code?"
Yes, that is a lesser known feature, what did you expect?
Jokes aside (with a level of truth to it), these features exist in many other
vi implementations but neither man pages cover their functionality in an
understandable language, describe it here instead.
@@ macros:
@@ repeats the last macro on next line
substitution backreference:
This inserts the text of matched group specified by \x where x is
group number. Example:
this is an example text for subs and has int or void
:%s/(int)\|(void)/pre\0after
this is an example text for subs and has preintafter or void
:%s/(int)\|(void)/pre\2after/g
this is an example text for subs and has prepreafterafter or prevoidafter
ex global command:
same syntax as ex substitution command, but instead of replacement
string it takes an ex command after the / / enclosed regex.
Example: remove empty lines
:g/^$/d
Try doing similar with substitution command - will not work as removing '\n'
without deleting the line is invalid, but it will work with global command.
search motions:
? and / searches have the ability to be used as motions. This seems very
counter intuitive and one would have never ever figure out that this
feature even exists, unless noted. Even if you read the source code it's
very easy to miss. How to use: optionally specify vi_arg1
, specify the motion
using it's keybind, then do / or ? and type out the search term.
The motion ends on the first match by default (no vi_arg1
specified).
The optional vi_arg1
determines how many matches of the term to skip until the
motion ends. Example: you see that the next 10 lines have the word "int"
which is included 3 times. You want to delete text until the 3rd
instance of "int" keybind would be 3d/int . Likewise you can opt out
of the "specify motion" part and just use / or ? with vi_arg1
to perform
specific searches.
Majestic EXINIT environment variable
At the zenith of your vi/nextvi education you'll find that EXINIT can be
used to achieve arbitrary level of customization. Using new ex command "tp"
any sequence of vi/ex commands can be performed at startup. This is where
real "groking vi" starts.
Example 1:
There is a dictionary file (assume vi.c), which we always want to have indexed
at startup for autocomplete feature in 31.
export EXINIT="e ./vi.c|tp i|u|bx 1|bx"
The last "bx" commands delete the vi.c buffer. "u" can be used to unmark vi.c as being modified. To keep it around as a buffer remove the "bx" commands. Example 2: Load your shell's history into vi's history buffer and adjust the data such that it is usable by appending ! at the beginning of command and escaping the "|" pipes the way ex prompt expects them (see 61.)
export EXINIT="e /root/.ash_history|tp yG:p:%s/^/!\\\|%s/ \\| / \\\\\\\| /g
qq|bx 1|bx|ft"
Congratulations, vi has unofficially replaced your shell's frontend. Example 3: Setup some custom @@ macros in your favorite registers.
export EXINIT="e|tp io{
}kA|tp 1G|tp 2\"ayy"
This macro gets loaded into register a, when @a is executed the macro will create { and closing } below the cursor leaving cursor in insert mode in between the braces. This is something you would commonly do in C like programming language. - CRITICAL - the new line inside the EXINIT string is literal, it's best to store the export command in a .sh file and set it using: $ . ./init.sh Otherwise this examples 2 and 3 may not work!!!
To improve nextvi's performance, shaping, character reordering, and syntax highlighting can be disabled by defining the EXINIT environment variable as "se noshape | se noorder | se nohl | se td=+2".
Options supported in nextvi:
td
Current direction context. The following values are meaningful:
* +2: always left-to-right.
* +1: follow conf.c's dircontexts[]; left-to-right for others.
* -1: follow conf.c's dircontexts[]; right-to-left for others.
* -2: always right-to-left.
shape
If set (default), performs Arabic/Farsi letter shaping.
order
If set, reorder characters based on the rules defined
in conf.c.
hl
If set (default), text will be highlighted based on syntax
highlighting rules in conf.c.
hll
If set, highlight current line.
ai
As in vi(1).
aw
As in vi(1).
ic
As in vi(1).
Special marks:
* the position of the previous change
[ the first line of the previous change
] the last line of the previous change
Special yank buffers:
/ the previous search keyword
: the previous ex command
Stress test: 1. Compile both versions with gcc -O2 -g (no customizations done) 2. Open Nextvi's vi.c and hold ^d until the end of the file 3. Capture the results with cachegrind 4. To find out what these values mean see: https://valgrind.org/docs/manual/cg-manual.html 5. To create the most optimal exe, enable PGO optimizations by compling via ./build.sh pgobuild which can lead to a significant performance boost on some application specific tasks. Feel free to adjust build.sh and the sample data on which it's being trained on, though default probably already good enough. 6. To improve nextvi's performance, shaping, character reordering, and syntax highlighting can be disabled by defining the EXINIT environment variable as "se noshape | se noorder | se nohl | se td=+2".
NEATVI (5e1f787eec332dcdf9f3608c0745551d5de72ad4):
--------------------------------------------------------------------------------
| I refs: 1,041,771,489
| I1 misses: 151,693
| LLi misses: 1,340
| I1 miss rate: 0.01%
| LLi miss rate: 0.00%
|
| D refs: 679,943,343 (378,140,004 rd + 301,803,339 wr)
| D1 misses: 459,782 ( 222,004 rd + 237,778 wr)
| LLd misses: 7,973 ( 1,123 rd + 6,850 wr)
| D1 miss rate: 0.1% ( 0.1% + 0.1% )
| LLd miss rate: 0.0% ( 0.0% + 0.0% )
|
| LL refs: 611,475 ( 373,697 rd + 237,778 wr)
| LL misses: 9,313 ( 2,463 rd + 6,850 wr)
| LL miss rate: 0.0% ( 0.0% + 0.0% )
--------------------------------------------------------------------------------
NEXTVI:
--------------------------------------------------------------------------------
| I refs: 440,620,805
| I1 misses: 87,812
| LLi misses: 1,261
| I1 miss rate: 0.02%
| LLi miss rate: 0.00%
|
| D refs: 212,119,603 (141,695,629 rd + 70,423,974 wr)
| D1 misses: 116,061 ( 57,181 rd + 58,880 wr)
| LLd misses: 6,257 ( 1,890 rd + 4,367 wr)
| D1 miss rate: 0.1% ( 0.0% + 0.1% )
| LLd miss rate: 0.0% ( 0.0% + 0.0% )
|
| LL refs: 203,873 ( 144,993 rd + 58,880 wr)
| LL misses: 7,518 ( 3,151 rd + 4,367 wr)
| LL miss rate: 0.0% ( 0.0% + 0.0% )
--------------------------------------------------------------------------------
"All software sucks, but some do more than others."
- Kyryl Melekhin
"Educated decisions assert the quantitative quality first."
- Kyryl Melekhin
"It’s possible that I understand better what’s going on, or
it’s equally possible that I just think I do."
— Russ Cox
"Vigorous writing is concise. A sentence should contain no unnecessary words
[and] a paragraph no unnecessary sentences, for the same reason that a drawing
should have no unnecessary lines and a machine no unnecessary parts. This
requires not that the writer make all his sentences short, or that he avoid
all detail and treat his subjects only in outline, but that every word tell."
— Elements of Style, William Strunk, Jr. - 1918
Kyryl Melekhin (kyx0r)
Ali Gholami Rudi (aligrudi)
Maxime Bouillot (Arkaeriit)
Kyryl Melekhin (kyx0r)
Kyryl Melekhin (kyx0r)
Cédric (Vouivre)
Ali Gholami Rudi (vi https://github.com/aligrudi/neatvi)
ArmaanB (bsd test)
aabacchus (build.sh)
illiliti (build.sh)
git-bruh (feedback)
and all users, posters & haters :/